Add per-table byte size metrics (lean_table_bytes) by pablodeymo · Pull Request #244 · lambdaclass/ethlambda

pablodeymo · 2026-03-18T22:09:50Z

Summary

Expose estimated byte size of each storage table as a Prometheus gauge (lean_table_bytes{table="..."}).
Uses RocksDB's estimate-live-data-size property — an O(1) metadata lookup per column family, no scanning.
Provides visibility into storage growth for improving pruning decisions.

How it works

StorageBackend trait gains estimate_table_bytes(table) with a default returning 0.
RocksDBBackend implements it via property_int_value_cf("rocksdb.estimate-live-data-size").
Store::estimate_table_bytes() passes through to the backend.
Prometheus lean_table_bytes IntGaugeVec with a table label, updated after each block is processed.

No startup scans, no atomic counters, no insert/prune instrumentation needed.

Tables tracked

Label	Description
`states`	Beacon states by root (largest — XMSS signatures)
`block_headers`	Block headers by root
`block_bodies`	Block bodies by root
`block_signatures`	Block signatures with attestation
`gossip_signatures`	Individual validator gossip signatures
`attestation_data_by_root`	Content-addressed attestation data
`live_chain`	Fork choice slot index
`metadata`	Store metadata (head, config, checkpoints)

Files changed

crates/storage/src/api/traits.rs — estimate_table_bytes on StorageBackend trait
crates/storage/src/backend/rocksdb.rs — RocksDB implementation via property_int_value_cf
crates/storage/src/api/tables.rs — Table::name() for metric labels
crates/storage/src/store.rs — estimate_table_bytes() pass-through
crates/blockchain/src/metrics.rs — lean_table_bytes IntGaugeVec
crates/blockchain/src/lib.rs — emit table bytes after each block processed

How to Test

Run a local devnet and query GET :5054/metrics
Verify lean_table_bytes{table="states"}, lean_table_bytes{table="block_headers"}, etc. are present
Values reflect RocksDB's estimate of live data per column family

Track the byte size (key + value) of each storage table as a live Prometheus gauge. This provides visibility into storage growth and is a prerequisite for improving pruning decisions. The approach uses one AtomicU64 per table in the Store struct: - On startup, a full scan seeds the counters from existing data. - On insert, the SSZ-encoded key+value size is added. - On prune/delete, the size of removed entries is subtracted. Exposed as lean_table_bytes{table="states"}, lean_table_bytes{table="block_headers"}, etc.

github-actions · 2026-03-18T22:11:20Z

🤖 Codex Code Review

The new byte accounting is incorrect for overwrites, and these code paths do overwrite existing keys. Both backends implement put_batch as an upsert, not an append, so blindly fetch_add-ing written bytes will overcount whenever a key is rewritten. The clearest case is the documented pending-block flow: insert_pending_block is explicitly followed by an idempotent overwrite in insert_signed_block / [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:801), but both paths add header/body/signature bytes again at [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:789], [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:814], [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:815], and [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:816]. The overwrite semantics are visible in [/home/runner/work/ethlambda/ethlambda/crates/storage/src/backend/in_memory.rs:99] and [/home/runner/work/ethlambda/ethlambda/crates/storage/src/backend/rocksdb.rs:112]. This will make lean_table_bytes drift upward over time. The fix should compute a byte delta against any existing value, not just add the new serialized size.
Metadata is included in the startup scan and exported as a metric, but it is never kept in sync afterward. scan_table_bytes counts all tables, including Metadata, and process_block exports every table. But both set_metadata and the batched metadata writes in update_checkpoints bypass the new counters entirely. Since head, justified/finalized checkpoints, time, and safe_target all live in Metadata, lean_table_bytes{table="metadata"} becomes stale almost immediately after startup.
The metric update happens before deferred pruning, so exported table sizes can stay stale until another block arrives. process_block updates the gauges at [/home/runner/work/ethlambda/ethlambda/crates/blockchain/src/lib.rs:288], but the caller prunes old states/blocks later in on_block. Those prune paths do adjust the in-memory counters in storage, e.g. [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:685] and [/home/runner/work/ethlambda/ethlambda/crates/storage/src/store.rs:759], but the Prometheus gauge is not refreshed afterward. If the node processes a block and then goes idle, the exported table sizes remain wrong.

No consensus-layer correctness, SSZ, or signature-validation regressions stood out here; the PR is operational/metrics-focused. Main gap is correctness of the new accounting itself, and I would want tests covering overwrite/idempotent writes, metadata updates, and post-prune metric refresh before merging.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

github-actions · 2026-03-18T22:11:44Z

🤖 Kimi Code Review

The PR adds per-table storage byte metrics to track database growth. Overall the approach is sound, but there are several correctness and performance issues to address.

Critical Issues

1. Incorrect byte accounting on gossip signature upserts (crates/storage/src/store.rs, lines 1178–1199)
The insert_gossip_signature method checks if a key exists to avoid inflating the entry counter, but it fails to adjust the byte counter when updating an existing entry. If the signature size changes (e.g., different XMSS scheme or serialization), the metric will drift.

Fix by calculating the delta:

if let Some(old_val) = old_value_len {
    let old_bytes = (encoded_key.len() + old_val.len()) as u64;
    self.sub_table_bytes(Table::GossipSignatures, old_bytes);
    self.add_table_bytes(Table::GossipSignatures, new_bytes);
} else {
    self.gossip_signatures_count.fetch_add(1, Ordering::Relaxed);
    self.add_table_bytes(Table::GossipSignatures, new_bytes);
}

2. Unhandled overwrites in other insert methods (insert_state, insert_attestation_data_by_root, insert_attestation_data_batch)
These methods unconditionally add bytes without checking for existing entries. While block roots and attestation roots should be unique in correct operation, defensive programming suggests either:

Checking for existing entries and adjusting the delta (safer)
Documenting

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-03-18T22:12:30Z

🤖 Claude Code Review

Now I have everything I need. Here's my review:

Review: Add per-table byte size metrics (`lean_table_bytes`)

The overall approach is sound — AtomicU64 per table seeded at startup and adjusted on every mutation is the right pattern for this. Most of the implementation is clean and correct. A few issues to address.

Bug: Double-counting for blocks going through the two-phase insert flow

crates/storage/src/store.rs:783–791 and :801–817

The doc comment on insert_pending_block explicitly says:

"When the block is later processed via insert_signed_block, the same keys are overwritten (idempotent)."

Both methods call write_signed_block for the same root and both increment the byte counters. The upsert means disk bytes don't grow on the second write, but the counters do — so block_headers, block_bodies, and block_signatures are double-counted for every block that flows through gossip receipt → validation. This will cause the gauges to drift upward indefinitely.

Fix: in insert_signed_block, skip the add_table_bytes calls for the three block tables if the header already exists (i.e., the block came through insert_pending_block first). Alternatively, subtract the existing size before adding the new size.

Bug: `Metadata` table not tracked for writes

crates/storage/src/store.rs:404–410

set_metadata writes to Table::Metadata but never calls add_table_bytes. This is called on every set_time(), set_head(), update_checkpoints(), etc. The startup scan seeds the correct initial value, but every subsequent write is untracked. The gauge will be stale immediately after the first block is processed.

Since metadata keys are fixed-size and values are small, the drift won't be large, but it will diverge from reality. set_metadata should call add_table_bytes for the written bytes (and subtract the old value's size if overwriting an existing key).

Potential underflow wrap in `sub_table_bytes`

crates/storage/src/store.rs (new sub_table_bytes helper)

AtomicU64::fetch_sub wraps on underflow. If the counter ever goes below zero (due to the double-counting bug above or any future inconsistency), it wraps to near u64::MAX. The downstream set(bytes as i64) cast would then produce a large negative number in Prometheus, which is harder to notice than an obviously wrong positive value. This is unlikely in steady state but worth hardening with a saturating subtraction:

fn sub_table_bytes(&self, table: Table, bytes: u64) {
    self.table_bytes[table.index()].fetch_update(Ordering::Relaxed, Ordering::Relaxed, |v| {
        Some(v.saturating_sub(bytes))
    }).ok();
}

Minor: Misleading variable name in `insert_gossip_signature`

crates/storage/src/store.rs (upsert check)

let old_value_len = self
    .backend
    .begin_read()
    .expect("read view")
    .get(Table::GossipSignatures, &encoded_key)
    .expect("get");
let is_new = old_value_len.is_none();

old_value_len holds the full value bytes (Option<Vec<u8>>), not a length. The name implies intent to use the length (subtract old, add new) but only .is_none() is checked. Rename to existing or old_value, or inline the check as it was before.

On the same note: for upsert updates to an existing gossip signature, the counter is not adjusted. XMSS signatures are fixed-size so this is safe in practice, but it's fragile — a comment noting the fixed-size assumption would prevent a future regression.

Minor: `TABLE_COUNT` and `Table::index()` are manually synced

Adding a new Table variant requires updating TABLE_COUNT, ALL_TABLES, Table::index(), and Table::name() — four places. A mismatch in TABLE_COUNT would be caught at compile time (array size mismatch), but a mismatch in Table::index() vs ALL_TABLES ordering would silently assign bytes to the wrong table. Consider a compile-time or test assertion:

#[cfg(test)]
#[test]
fn table_index_matches_all_tables_order() {
    for (i, table) in ALL_TABLES.iter().enumerate() {
        assert_eq!(table.index(), i, "index() out of sync with ALL_TABLES for {:?}", table);
    }
}

Good

The BlockWriteBytes struct cleanly returns per-table write sizes from write_signed_block without further coupling.
prune_by_slot now returns (count, bytes) and handles the subtraction internally — callers that don't need the byte count use _ idiomatically.
scan_table_bytes correctly ignores the Metadata table's dynamic nature because it seeds all 8 tables; the startup numbers are correct.
Ordering::Relaxed is appropriate for metrics gauges where slight staleness is acceptable.

Automated review by Claude (Anthropic) · sonnet · custom prompt

greptile-apps · 2026-03-18T22:13:21Z

Greptile Summary

This PR adds per-table Prometheus gauges (lean_table_bytes{table="..."}) to expose estimated live byte sizes for all 8 RocksDB column families. It avoids manual byte-counting entirely by delegating to RocksDB's built-in rocksdb.estimate-live-data-size property — an O(1) metadata lookup — queried once per block processed. The implementation is clean and avoids the wrapping/drift issues that plague manually-tracked counters.

Table::name() introduced for human-readable Prometheus label strings (exhaustive match, compiler-enforced).
StorageBackend::estimate_table_bytes() trait method with a default 0 return keeps in-memory backends source-compatible.
RocksDBBackend implements it via property_int_value_cf, falling back to 0 gracefully if a CF handle is missing.
Store::estimate_table_bytes() is a thin pass-through with no added state.
update_table_bytes in metrics.rs uses the same LazyLock<IntGaugeVec> pattern as all other metrics in the file; the only minor inconsistency is bytes as i64 rather than the saturating .try_into().unwrap_or(i64::MAX) used by peer metrics.

Confidence Score: 5/5

This PR is safe to merge; it adds read-only observability with no writes, no new state, and no impact on consensus logic.
The approach is minimal and correct: it delegates to a well-known O(1) RocksDB property, uses a compiler-enforced exhaustive match for table names, and introduces zero manual counter management. The single flagged issue (bytes as i64 vs try_into) is a cosmetic inconsistency with negligible real-world impact.
No files require special attention.

Important Files Changed

Filename	Overview
crates/blockchain/src/metrics.rs	Adds `update_table_bytes` function using a `LazyLock<IntGaugeVec>` pattern consistent with the rest of the file. Minor: `bytes as i64` cast should use `try_into().unwrap_or(i64::MAX)` to match the saturating-conversion pattern used by other metrics in the same file.
crates/storage/src/api/tables.rs	Adds `Table::name()` returning human-readable metric label strings. Exhaustive `match` ensures the compiler will catch any missing variant. Strings are identical to `cf_name()` in rocksdb.rs but serve a distinct purpose (metric labels vs stable CF names), so the duplication is intentional.
crates/storage/src/api/traits.rs	Adds `estimate_table_bytes` to `StorageBackend` with a sensible default of 0, keeping in-memory backends compatible without changes. Clean, minimal, non-breaking addition.
crates/storage/src/backend/rocksdb.rs	Implements `estimate_table_bytes` via `property_int_value_cf("rocksdb.estimate-live-data-size")` — an O(1) metadata-only lookup. Gracefully falls back to 0 if the CF handle is not found. Correct and idiomatic.
crates/storage/src/store.rs	Minimal pass-through `estimate_table_bytes` delegating directly to the backend. No state management needed since the approach relies on RocksDB's native property rather than manual counters.
crates/blockchain/src/lib.rs	Iterates over all 8 tables and emits `lean_table_bytes` after each block is processed. The 8 O(1) RocksDB property reads per block add negligible overhead. Straightforward integration.
crates/storage/src/lib.rs	Adds `ALL_TABLES` to the crate's public API surface so the blockchain crate can iterate over all tables. One-line change, correct.

Sequence Diagram

sequenceDiagram
    participant BC as BlockChainServer
    participant Store as Store
    participant Backend as RocksDBBackend
    participant RDB as RocksDB

    BC->>BC: process_block(signed_block)
    loop for each table in ALL_TABLES
        BC->>Store: estimate_table_bytes(table)
        Store->>Backend: estimate_table_bytes(table)
        Backend->>RDB: cf_handle(cf_name(table))
        RDB-->>Backend: ColumnFamily
        Backend->>RDB: property_int_value_cf(cf, "rocksdb.estimate-live-data-size")
        RDB-->>Backend: u64 bytes
        Backend-->>Store: u64 bytes
        Store-->>BC: u64 bytes
        BC->>BC: metrics::update_table_bytes(table.name(), bytes)
    end

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/blockchain/src/metrics.rs
Line: 323

Comment:
**Silent negative metric for very large table sizes**

`bytes as i64` is a bitwise cast — for values above `i64::MAX` (~9.2 EB) it silently produces a **negative** Prometheus gauge value rather than panicking or clamping. While astronomically unlikely today, it is inconsistent with the pattern used elsewhere in this file (e.g. `update_validators_count` uses `.try_into().unwrap()`, `set_attestation_committee_count` uses `.try_into().unwrap_or_default()`). A negative byte-size metric would be confusing and hard to detect.

Prefer a saturating conversion to be consistent and safe:

```suggestion
        .set(bytes.try_into().unwrap_or(i64::MAX));
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "Use RocksDB estimate..."}

crates/storage/src/store.rs

crates/storage/src/api/tables.rs

- Extract sum_entry_bytes() helper to deduplicate byte-summing in prune_old_states, prune_old_blocks, and delete_gossip_signatures - Merge count_gossip_signatures into scan_tables to avoid scanning the GossipSignatures table twice at startup - Remove dead (usize, u64) return from prune_by_slot — callers never used the bytes value since sub_table_bytes is called inside - Fix double-counting bug: insert_pending_block no longer tracks bytes since the same keys are overwritten by insert_signed_block - Revert unnecessary old_value_len rename in insert_gossip_signature

Replace the AtomicU64 bookkeeping approach with a direct query to RocksDB's per-column-family property. This eliminates: - Startup table scan (scan_tables) - Byte tracking on every insert and prune - BlockWriteBytes struct, sum_entry_bytes helper, add/sub_table_bytes The StorageBackend trait gains estimate_table_bytes(table) with a default returning 0. RocksDB implements it via property_int_value_cf on the "rocksdb.estimate-live-data-size" property — an O(1) metadata lookup, no scanning required.

rocksdb.estimate-live-data-size only counts flushed SST files, so the metric stays at 0 until RocksDB triggers a compaction. Add rocksdb.cur-size-all-mem-tables to capture in-memory data too.

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

crates/storage/src/store.rs Outdated Show resolved Hide resolved

crates/storage/src/store.rs Show resolved Hide resolved

crates/storage/src/api/tables.rs Outdated Show resolved Hide resolved

pablodeymo added 3 commits March 18, 2026 19:17

Include memtable size in table bytes metric

d595a96

rocksdb.estimate-live-data-size only counts flushed SST files, so the metric stays at 0 until RocksDB triggers a compaction. Add rocksdb.cur-size-all-mem-tables to capture in-memory data too.

MegaRedHand approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-table byte size metrics (lean_table_bytes)#244

Add per-table byte size metrics (lean_table_bytes)#244
pablodeymo wants to merge 4 commits intomainfrom
add-table-bytes-metrics

pablodeymo commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pablodeymo commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Tables tracked

Files changed

How to Test

Uh oh!

github-actions bot commented Mar 18, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Mar 18, 2026

🤖 Kimi Code Review

Critical Issues

Uh oh!

github-actions bot commented Mar 18, 2026

🤖 Claude Code Review

Review: Add per-table byte size metrics (lean_table_bytes)

Bug: Double-counting for blocks going through the two-phase insert flow

Bug: Metadata table not tracked for writes

Potential underflow wrap in sub_table_bytes

Minor: Misleading variable name in insert_gossip_signature

Minor: TABLE_COUNT and Table::index() are manually synced

Good

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pablodeymo commented Mar 18, 2026 •

edited

Loading

Review: Add per-table byte size metrics (`lean_table_bytes`)

Bug: `Metadata` table not tracked for writes

Potential underflow wrap in `sub_table_bytes`

Minor: Misleading variable name in `insert_gossip_signature`

Minor: `TABLE_COUNT` and `Table::index()` are manually synced

greptile-apps bot commented Mar 18, 2026 •

edited

Loading