release-metrics

Evaluate Lodestar release candidate readiness by comparing beta/RC metrics against stable. Use when deploying a new RC to beta nodes, reviewing metrics before cutting a release, or assessing whether a release candidate has regressions. Covers health checks, performance comparison, memory/resource analysis, validator effectiveness, networking quality, and PeerDAS-specific metrics. Requires Grafana/Prometheus access to the Lodestar monitoring stack.

lodekeeper 4 1 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add lodekeeper/dotfiles/release-metrics

Install via the SkillsCat registry.

SKILL.md

Release Metrics Acceptance

Evaluate whether a Lodestar release candidate is ready to ship by comparing RC nodes against
stable baseline. This skill codifies the team's release review process distilled from multiple
release cycles (v1.34–v1.40).

When to Run

After deploying an RC to beta (or feat) group nodes
Minimum 3 days soak time before release decision (shorter only if hotfix)
Before the team standup where the release decision is made
When asked to compare beta/RC vs stable metrics

Quick Start

Read references/prometheus-queries.md — contains all PromQL queries organized by category
Run queries against the Grafana Prometheus datasource comparing RC group vs stable group
Evaluate each metric category against the acceptance criteria below
Generate a release readiness report with a GO / NO-GO / INVESTIGATE recommendation

Metric Categories & Acceptance Criteria

1. Node Health (P0 — must pass)

Metric	Criteria	Action if Failed
Sync status	0 slots behind head	NO-GO
Finalization	Finalizing (distance ≤ 2 epochs)	NO-GO
Reorgs	Zero or same as stable	INVESTIGATE if higher
Peer count	150–250, stable (not spiky)	INVESTIGATE if volatile
Block processor queue	Empty (≤ 1)	INVESTIGATE if growing
Error rate	No new error types vs stable	INVESTIGATE

2. Attestation & Validator Performance (P0)

Metric	Criteria	Action if Failed
Head vote accuracy	≥ stable (compare 6h+ windows)	NO-GO if consistently worse
Target/source hit rate	≥ 99.5%	NO-GO if below
Wrong head ratio	≤ stable	NO-GO if regression
Inclusion distance	Avg ≈ 1 (compare to stable)	INVESTIGATE if higher
ATTESTER miss ratio	≤ stable	INVESTIGATE
TARGET miss ratio	≤ stable	INVESTIGATE

3. Block Processing (P0)

Metric	Criteria	Action if Failed
Block gossip → set as head	≤ stable avg (typically < 3s)	INVESTIGATE if > 4s
Process block time (avg)	≤ stable	INVESTIGATE if regression
Blocks set as head after 4s	Rate ≤ stable	INVESTIGATE
Process block count per slot	≈ 1	INVESTIGATE if consistently > 1
Epoch transition time	≤ stable	INVESTIGATE if regression
Epoch transitions per epoch	≈ 1	INVESTIGATE if > 1

4. Memory & Resources (P1)

Metric	Criteria	Action if Failed
RSS memory	Within ±20% of stable (same node type)	INVESTIGATE if higher
V8 heap used	Flat trend, no leaks, within ±20% of stable	INVESTIGATE
External memory	Flat, no growth trend	NO-GO if unbounded growth
Process heap bytes	Flat or declining after startup	INVESTIGATE if growing
GC pause rate	< 20% of slot time	INVESTIGATE if higher
CPU usage	Within ±30% of stable	INVESTIGATE
Disk usage	No abnormal growth rate	INVESTIGATE if > 85%

Important: Fresh processes have higher RSS than long-running ones (Node.js returns memory
to OS over time). Compare at similar uptimes — a 2-day-old beta vs 10-day-old stable will
show ~20-40% higher RSS naturally. This is NOT a regression.

5. Networking (P1)

Metric	Criteria	Action if Failed
Gossip validation job time	≤ stable	INVESTIGATE
Gossip validation dropped jobs	0 or ≤ stable	INVESTIGATE
Gossip block received delay	≤ stable	INVESTIGATE
Avg mesh peers (attestation subnets)	≥ stable	INVESTIGATE
Peer score distribution	No increase in negative-score peers	INVESTIGATE
Gossip RPCs tx/rx per sec	Similar to stable	INVESTIGATE
Dial errors / timeouts	≤ stable baseline	INVESTIGATE if spike
req/resp error rate	≤ stable	INVESTIGATE

6. DB & I/O (P1)

Metric	Criteria	Action if Failed
Archive blocks time	≤ stable (watch supernodes)	INVESTIGATE if > 10s
Hot-to-cold migration time	≤ stable	INVESTIGATE
DB write queue depth	Low, not growing	INVESTIGATE
Prometheus scrape duration	≤ stable	INVESTIGATE if slower

7. PeerDAS-Specific (P1, post-Fulu)

Metric	Criteria	Action if Failed
Custody column availability	Columns served matches custody groups	INVESTIGATE
Missing custody columns	Rate ≤ stable (counters grow but rate matters)	INVESTIGATE
Reconstructed columns	0 in steady state	INVESTIGATE if non-zero
Data column gossip time	≤ stable	INVESTIGATE
Column sampling success rate	≥ stable	INVESTIGATE

Node Type Comparison Matrix

Compare metrics across matching node types between RC and stable:

Type	Custody Groups	Key Focus
solo	4-8	Baseline: should match stable closely
semi	64	Memory scaling with custody
super	128	Stress test: memory, CPU, head votes
SAS (supernode+validator+EL)	128	Worst case: CPU, event loop, GC
arm64	4-8	Architecture: binary compatibility, perf parity
mainnet	varies	Real-world: actual block sizes, peer diversity

Key insight: Regressions in solo/semi that don't appear in super = likely not custody-related.
Regressions only in super/SAS = investigate custody group scaling.

Comparison Methodology

Same timeframe: Compare RC and stable over the same wall-clock period
Same hardware: Compare nodes on identical server specs (watch for Hetzner variance)
Rate interval: Use 6h or 12h for smoothed comparisons, 1h for recent trends
Multiple nodes: A regression must appear on ≥ 2 nodes of same type to be signal, not noise
Uptime normalization: Note uptime difference when comparing memory metrics
Check all networks: Don't forget gnosis/chiado — config edge cases lurk there

Report Template

# Release Readiness Report: v{VERSION}

**Date:** {date}
**RC deployed:** {rc_version} on {date}
**Soak time:** {days} days
**Groups compared:** {rc_group} vs {stable_group}

## Summary: {GO | NO-GO | NEEDS INVESTIGATION}

## Node Health
- Sync: ✅/❌
- Finalization: ✅/❌
- Reorgs: ✅/❌
- Peers: ✅/❌

## Performance
- Head vote accuracy: {rc}% vs {stable}% — ✅/❌
- Block gossip → head: {rc}s vs {stable}s — ✅/❌
- Epoch transition: {rc}ms vs {stable}ms — ✅/❌

## Resources
- RSS: {comparison table}
- Memory trend: {flat/growing/declining}
- GC: {rc}% vs {stable}%
- CPU: {rc} vs {stable} cores

## Networking
- Peers: {count}, stability: {stable/spiky}
- Gossip job time: ✅/❌
- Error rate: ✅/❌

## PeerDAS (if applicable)
- Custody columns: ✅/❌
- Missing rate: ✅/❌

## Items to Watch
1. {any metrics that need continued monitoring}

## Recommendation
{GO / NO-GO / INVESTIGATE with reasoning}

Additional Checks (Non-Metric)

These are not Prometheus metrics but are part of the release process:

Docker image size unchanged (no unintended bloat from dependency changes)
Binary builds work on amd64 and arm64
No breakage for downstream deployments (eth-docker, rocketpool, dappnode)
lodestar dev sanity check passes
No new deprecation warnings in logs
CI benchmarks show no performance regression

References

references/prometheus-queries.md — All PromQL queries by category
references/grafana-dashboards.md — Dashboard UIDs and panel descriptions

Self-Maintenance

If any commands, file paths, URLs, or configurations in this skill are outdated or no longer work, update this SKILL.md with the correct information after completing your current task. Skills should stay accurate and self-healing — fix what you find broken.

release-metrics

Resources

Install

Release Metrics Acceptance

When to Run

Quick Start

Metric Categories & Acceptance Criteria

1. Node Health (P0 — must pass)

2. Attestation & Validator Performance (P0)

3. Block Processing (P0)

4. Memory & Resources (P1)

5. Networking (P1)

6. DB & I/O (P1)

7. PeerDAS-Specific (P1, post-Fulu)

Node Type Comparison Matrix

Comparison Methodology

Report Template

Additional Checks (Non-Metric)

References

Self-Maintenance

Categories

Install

Recommended Skills