Skip to content

forest-tool archive merge #6733

@infoboy27

Description

@infoboy27

Describe the bug

After upgrading to a Forest build that includes the large-index CAR support from PR #6690, forest-tool archive merge can write very large multi-skip-frame CAR files (e.g. merged_batch_96.forest.car.zst, merged_batch_101.forest.car.zst), and the Forest daemon can successfully import and run from them, but forest-tool archive merge later fails to use those same files as input to subsequent merges with:

Error: input not recognized as any kind of CAR data (.car, .car.zst, .forest.car)

So the daemon’s CAR reader accepts these multi-skip-frame archives, but the archive-merge CLI’s CAR detection/reader rejects them when they are used as input.

To reproduce
Go to a host with Forest built from main including commit 13cd7c3 (or use Docker image ghcr.io/chainsafe/forest:2026-03-06-13cd7c3) and prepare:

A lite snapshot (30k height), e.g.
snapshots/forest_snapshot_mainnet_2020-09-04_height_30000.forest.car.zst
A diff list file with 863 entries (3k-epoch diffs), e.g.
snapshots/diffs_from_30k.txt containing forest_diff_mainnet_..._height_30000+3000.forest.car.zst up to ..._2023-02-19_height_2616000+3000.forest.car.zst
Run forest-tool archive merge in batches (inside Docker) to produce a large intermediate snapshot, for example:

docker run --rm --entrypoint ""
-v file-coin_node-data:/data
-v /home/ubuntu/file-coin/snapshots:/snapshots:ro
ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
forest-tool archive merge --force
-o /data/merged_batch_96.forest.car.zst
/snapshots/forest_snapshot_mainnet_2020-09-04_height_30000.forest.car.zst
/snapshots/<diff_1> ... /snapshots/<diff_96>

This succeeds and produces a large CAR (merged_batch_96.forest.car.zst ~668 GB).
A later run with smaller batches also produces merged_batch_101.forest.car.zst (~665 GB).

Confirm that the daemon can import and run from these snapshots using the same image:

forest
--chain mainnet
--no-gc
--rpc-address 0.0.0.0:2345
--import-snapshot /data/merged_batch_96.forest.car.zst
--healthcheck-address 0.0.0.0:2346
--import-mode=copy
--req-window 32
--target-peer-count 120

and similarly with:

forest --import-snapshot /data/merged_batch_101.forest.car.zst ...
The daemon logs show successful import, and forest-cli sync status confirms the node is running and syncing from the imported height.

Now try to continue merging from one of these large snapshots using forest-tool archive merge:

Case A – from merged_batch_96, BATCH=10

BATCH=10 FOREST_MERGE_IMAGE=ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
./scripts/run_merge_with_fix.sh
Script resumes from /data/merged_batch_96.forest.car.zst and runs:

[Batch 10/87] Merging diffs 97-106 -> /data/merged_batch_106.forest.car.zst
Error: input not recognized as any kind of CAR data (.car, .car.zst, .forest.car)
Case B – from merged_batch_96, BATCH=5

BATCH=5 FOREST_MERGE_IMAGE=ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
./scripts/run_merge_with_fix.sh
This produces another very large CAR, then fails again:

[Batch 20/173] Merging diffs 97-101 -> /data/merged_batch_101.forest.car.zst
-> done (665.5G)
[Batch 21/173] Merging diffs 102-106 -> /data/merged_batch_106.forest.car.zst
Error: input not recognized as any kind of CAR data (.car, .car.zst, .forest.car)
Case C – from merged_batch_101, BATCH=2

BATCH=2 FOREST_MERGE_IMAGE=ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
./scripts/run_merge_with_fix.sh
Script resumes from /data/merged_batch_101.forest.car.zst:

Resuming: merged_batch_101 found. Starting from diff index 101 (batch 51/432)
...
[Batch 51/432] Merging diffs 102-103 -> /data/merged_batch_103.forest.car.zst
This shows that writing another large CAR still works from a multi-frame base, but subsequent attempts to advance toward merged_batch_106 again hit the same “input not recognized as any kind of CAR data” error.

Log output
Log Output
Using Forest image with large-index fix: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
Batch size: 10
Resuming: merged_batch_96 found. Starting from diff index 96 (batch 10/87)
Lite: /home/ubuntu/file-coin/snapshots/forest_snapshot_mainnet_2020-09-04_height_30000.forest.car.zst
Total diffs: 863 (batch size 10, image: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3, total batches: 87)
Progress log: /home/ubuntu/file-coin/snapshot/merge_progress.txt
[Batch 10/87] Merging diffs 97-106 -> /data/merged_batch_106.forest.car.zst
Error: input not recognized as any kind of CAR data (.car, .car.zst, .forest.car)
Merge failed at batch 10. Re-run to resume from merged_batch_106 (or remove incomplete /data/merged_batch_106.forest.car.zst and re-run to retry from previous).
Using Forest image with large-index fix: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
Batch size: 5
Resuming: merged_batch_96 found. Starting from diff index 96 (batch 20/173)
Lite: /home/ubuntu/file-coin/snapshots/forest_snapshot_mainnet_2020-09-04_height_30000.forest.car.zst
Total diffs: 863 (batch size 5, image: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3, total batches: 173)
Progress log: /home/ubuntu/file-coin/snapshot/merge_progress.txt
[Batch 20/173] Merging diffs 97-101 -> /data/merged_batch_101.forest.car.zst
-> done (665.5G)
[Batch 21/173] Merging diffs 102-106 -> /data/merged_batch_106.forest.car.zst
Error: input not recognized as any kind of CAR data (.car, .car.zst, .forest.car)
Merge failed at batch 21. Re-run to resume from merged_batch_106 (or remove incomplete /data/merged_batch_106.forest.car.zst and re-run to retry from previous).
Using Forest image with large-index fix: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3
Batch size: 2
Resuming: merged_batch_101 found. Starting from diff index 101 (batch 51/432)
Lite: /home/ubuntu/file-coin/snapshots/forest_snapshot_mainnet_2020-09-04_height_30000.forest.car.zst
Total diffs: 863 (batch size 2, image: ghcr.io/chainsafe/forest:2026-03-06-13cd7c3, total batches: 432)
Progress log: /home/ubuntu/file-coin/snapshot/merge_progress.txt
[Batch 51/432] Merging diffs 102-103 -> /data/merged_batch_103.forest.car.zs

Expected behaviour
Large multi-skip-frame CAR archives produced by forest-tool archive merge (with PR #6690 applied) should be fully reusable:
The daemon can import them as snapshots (this already works), and
forest-tool archive merge should also be able to use them as inputs for subsequent merges without reporting “input not recognized as any kind of CAR data”.
In other words, the CAR reader used by forest-tool archive merge should understand the same multi-skip-frame encoding format that the daemon reader uses, so we can chain multiple merges all the way to a final merged.forest.car.zst.

Screenshots
N/A (CLI/daemon behaviour). Terminal screenshots can be provided if helpful.

Environment (please complete the following information):
OS: Ubuntu Linux (e.g. 6.8.0-85-generic, x86_64)
Branch/commit: main at commit 13cd7c3 (Forest image ghcr.io/chainsafe/forest:2026-03-06-13cd7c3 which includes PR #6690)
Hardware: (for example, please fill in)
N CPU cores
M GB RAM
Fast SSD for Forest data and snapshot volume

Other information and links
This is a follow-up to the original large-index TryFromIntError issue, which PR #6690 fixed by:
Widening index size to u64,
Emitting multi-skip-frame Zstd encoding via ZstdSkipFramesEncodedDataReader / write_zstd_skip_frames_into, etc.
The fix appears to be partially applied: the daemon’s reader accepts the new format, but the archive-merge CLI’s CAR detection/reader path seems not to recognize these multi-skip-frame CARs as valid input.
I can run additional forest-tool archive info on the problematic CARs and share their reported metadata, or test any debug builds with extra logging around CAR detection and index reading.

Metadata

Metadata

Assignees

Labels

Type: BugSomething isn't working

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions