feat(cli): add reingest command for re-ingesting execution results by lewisjared · Pull Request #610 · Climate-REF/climate-ref

lewisjared · 2026-03-31T11:39:36Z

Description

Add ref executions reingest command that re-runs build_execution_result() on existing output files and re-ingests the results into the database without re-executing diagnostics. This is useful when new series definitions or metadata extraction logic has been added.

Key features:

Three modes via ReingestMode enum: additive (keep existing, add new), replace (delete and re-ingest), versioned (new execution record)
Enum-based --mode option for better UX (tab completion, validated choices shown in help)
Safety guards: requires at least one filter, confirmation prompt, dry-run support
Scratch directory pattern: copies results to scratch before re-extraction, cleans up via try/finally
Savepoint-based rollback: DB mutations happen in a single savepoint so failures don't corrupt state
Path traversal safety: validates all computed paths stay within expected directories
Versioned re-run support: repeated versioned reingests produce unique output fragments (_v{hash}_n{count})
Failed execution recovery: mark_successful is called for all modes, allowing --include-failed to recover failed executions

Changes

packages/climate-ref/src/climate_ref/executor/reingest.py -- new module with reingest logic, mode enum, execution definition reconstruction, and metric ingestion
packages/climate-ref/src/climate_ref/cli/executions.py -- new reingest CLI command with enum-based mode, filters, dry-run, and confirmation
packages/climate-ref/src/climate_ref/executor/__init__.py -- no eager reingest import (avoids breaking imports-without-extras)
packages/climate-ref/tests/unit/executor/test_reingest.py -- 22 tests covering all modes, idempotency, rollback, dirty flag preservation, scratch cleanup, versioned re-runs, and failed execution recovery

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

Add `ref executions reingest` command that re-runs build_execution_result() on existing outputs without re-executing diagnostics. Uses ReingestMode enum for the --mode option, providing tab completion and validated choices (additive, replace, versioned) via Typer's native enum support.

…d scratch cleanup Collapse four near-identical metric ingestion functions into two by adding an optional `existing` parameter for additive dedup. Hoist the dimension query into `_ingest_metrics` to avoid duplicate DB queries. Fix double iteration of `iter_results()` generator. Add try/finally scratch directory cleanup to prevent disk accumulation during batch reingests. Remove redundant dirty-flag save/restore in CLI.

Copilot

Pull request overview

Adds a new “reingest” workflow to the Climate REF CLI/executor layer to re-run build_execution_result() against existing on-disk outputs and ingest updated metrics/metadata into the DB without re-executing diagnostics.

Changes:

Introduces climate_ref.executor.reingest with ReingestMode and reingest/query helpers.
Adds ref executions reingest CLI command with safety guards (filters required, confirm prompt, dry-run) and mode selection.
Adds unit tests for reingest modes and filtering, plus a changelog entry.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
packages/climate-ref/src/climate_ref/executor/reingest.py	Implements reingest logic, scratch copying, savepoint-based DB mutation, and execution querying.
packages/climate-ref/src/climate_ref/cli/executions.py	Adds `ref executions reingest` command wiring and UX/safety features.
packages/climate-ref/src/climate_ref/executor/init.py	Exposes `ReingestMode` and `reingest_execution` in executor exports.
packages/climate-ref/tests/unit/executor/test_reingest.py	New unit tests covering reingest behavior across modes and filters.
changelog/610.feature.md	Documents the new CLI command and modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/climate-ref/src/climate_ref/executor/reingest.py

packages/climate-ref/src/climate_ref/cli/executions.py

packages/climate-ref/tests/unit/executor/test_reingest.py

codecov · 2026-03-31T11:48:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
core	`?`
providers	`91.85% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 81 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Fix ensure_relative_path to fall back to scratch base when output bundle contains absolute paths under scratch directory - Always clear ExecutionOutput rows before re-inserting in all modes (fixes duplicate outputs in additive mode) - Add _n suffix for repeated versioned reingests on the same execution - Mark execution as successful for all modes, not just versioned - Add path traversal safety checks in copy operations - Document single-threaded scratch directory assumption - Fix test to use ResultOutputType enum instead of raw string - Fix comment wording (session -> transaction) - Add tests for versioned re-run uniqueness, mark_successful on failed executions, and scratch directory cleanup

The reingest module imports pandas, CMECMetric, and other heavy dependencies that are not available when the package is installed without extras. The CLI imports directly from climate_ref.executor.reingest, so the re-export is unnecessary.

Add tests for previously uncovered helper functions: - _copy_results_to_scratch: path traversal rejection and directory copy - _get_existing_metric_dimensions: empty and populated cases - _handle_reingest_output_bundle: output registration with real data - _ingest_metrics: scalar ingestion and additive dedup with real CMEC bundles - reconstruct_execution_definition: with linked CMIP6 datasets Add output_dir_with_data fixture with valid 2-level nested CMECMetric JSON and output bundles with plot entries.

Extract mock_result_factory fixture and _patch_build_result helper to eliminate 11 repetitions of 8-line mock ExecutionResult boilerplate. Consolidate shared TSeries data into SAMPLE_SERIES constant and results dir setup into _create_results_dir helper. Add autouse fixture for provider registration in TestGetExecutionsForReingest.

Add 22 new tests covering the reingest CLI command and additional reingest module paths. Fix transaction handling bug in the CLI reingest command where db.session.begin() failed when a transaction was already active. Make --mode a required option so the user must explicitly choose the reingest strategy (additive, replace, or versioned). CLI tests (10): no-filter guard, no-results, dry-run, cancellation, force mode, group IDs, mode parametrized, include-failed, success path, missing mode error. Unit tests (13): unsuccessful result, scratch fallback, copy helpers, series ingestion, provider/diagnostic filters, empty datasets.

…n logic Extract shared functions (ingest_scalar_values, ingest_series_values, register_execution_outputs) into result_handling.py and have reingest.py import them instead of maintaining parallel implementations.

…uivalence tests Both handle_execution_result and reingest now delegate to a single ingest_execution_result() function for output registration and metric ingestion, preventing the two paths from drifting. Adds unit test verifying versioned reingest produces equivalent DB state to fresh ingestion, and integration tests that solve with the example provider then reingest in all three modes (additive, replace, versioned).

The '--mode' text in typer's error output includes ANSI bold escape codes around the dashes, causing the exact string match to fail in CI. Split into two assertions that match the key parts independently.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

packages/climate-ref/src/climate_ref/executor/reingest.py:502

For non-versioned modes, this function copies into original_results_dir but never validates that original_results_dir stays within config.paths.results. The only safety check is for target_results_dir, which isn't the destination used in the else-branch. Add the same (resolved) containment validation for original_results_dir before copying.
packages/climate-ref/src/climate_ref/executor/reingest.py:559
_copy_results_to_scratch() can raise (e.g., ValueError for unsafe fragments). reingest_execution() calls it outside of any try/except, so a single malformed Execution.output_fragment can abort the entire CLI run rather than being treated as a skipped execution. Consider catching exceptions from _copy_results_to_scratch (and from the subsequent filesystem copy-back) and returning False after logging, consistent with other skip paths.
packages/climate-ref/src/climate_ref/executor/reingest.py:603
The results tree is updated on disk before the surrounding DB transaction is committed (the CLI commits after reingest_execution returns). If the outer commit fails/rolls back for any reason, DB state and filesystem state can diverge. To keep them consistent, consider moving the copy-back to the caller after a successful commit (e.g., return the scratch_dir/target_execution and copy after the transaction) or committing within reingest_execution before mutating the results directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T11:36:17Z

packages/climate-ref/src/climate_ref/executor/reingest.py

+        return False
+
+    cv = CV.load_from_file(config.paths.dimensions_cv)
+
+    try:
+        target_execution = _apply_reingest_mode(
+            config=config,


The path-traversal guard here relies on Path.is_relative_to() for src/dst. This check is purely lexical and does not eliminate .. segments or resolve symlinks, so an output_fragment like ../outside can still pass while escaping the base when used by shutil. Consider validating using resolved paths (e.g., (base / fragment).resolve().is_relative_to(base.resolve())) and/or explicitly rejecting absolute paths and any .. parts before calling rmtree/copytree.

lewisjared added 4 commits March 31, 2026 20:19

style(cli): remove redundant inline comments in reingest command

42a9e33

chore: add changelog entry for PR #610

484d6bc

lewisjared requested a review from Copilot March 31, 2026 11:40

Copilot started reviewing on behalf of lewisjared March 31, 2026 11:42 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

lewisjared added 9 commits April 2, 2026 12:17

chore: fix update to environs api

08d8ddb

fix: clean up finalized->finalised

c02beab

lewisjared requested a review from Copilot April 3, 2026 11:30

fix(test): handle ANSI escape codes in typer missing option error

9f05ba0

The '--mode' text in typer's error output includes ANSI bold escape codes around the dashes, causing the exact string match to fail in CI. Split into two assertions that match the key parts independently.

Copilot started reviewing on behalf of lewisjared April 3, 2026 11:30 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add reingest command for re-ingesting execution results#610

feat(cli): add reingest command for re-ingesting execution results#610
lewisjared wants to merge 14 commits intomainfrom
feat/reingest-enum-mode

lewisjared commented Mar 31, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lewisjared commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lewisjared commented Mar 31, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading