Conversation
📝 WalkthroughWalkthroughAdds native PostgreSQL replication support alongside Spock: new query helpers and SQL templates for native origins/slots, runtime detection of the Spock extension with conditional logic, refactors to use origin-name abstractions across consistency/repair code, and a testEnv plus native integration tests and CI step. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/consistency/mtree/merkle.go (1)
501-520:⚠️ Potential issue | 🟠 MajorKeep per-node origin-ID mappings instead of sharing one node's map.
buildFetchRowsSQL*now extracts the rawroidentfrompg_xact_commit_timestamp_origin(xmin), butloadSpockNodeNames()returns after loading just the first available node's origin names, then applies that single mapping to rows from every node viaTranslateNodeOrigin(). Each PostgreSQL node maintains its ownpg_replication_origincatalog, so the same numericroidentvalue may represent different origins on different nodes. This will causenode_originto be mislabeled in Merkle diff output. Either maintain per-node mappings or ensure origin lookups happen per node as rows are processed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/mtree/merkle.go` around lines 501 - 520, loadSpockNodeNames currently stops after the first successful node and sets a single SpockNodeNames map which is then (incorrectly) used for all nodes; change the task to store per-node origin mappings (e.g. SpockNodeNames map[string]map[uint32]string keyed by node identifier), have loadSpockNodeNames populate the map for each ClusterNodes entry (do not return after first success and cache failures per-node), and update places that call TranslateNodeOrigin (and the code paths that use the raw roident produced by buildFetchRowsSQL* functions) to look up the origin name from the per-node map for that specific node. Ensure connections are closed on every iteration and errors are collected/returned appropriately if a node’s mapping cannot be loaded.internal/consistency/diff/table_diff.go (1)
186-209:⚠️ Potential issue | 🟠 Majorroident from first pool reused across all nodes breaks filtering and metadata on native PostgreSQL.
loadSpockNodeNames()loads origin-to-name mappings from a single pool, while bothbuildEffectiveFilter()(line 255) andfetchRows()(line 449) use that mapping to filter rows and label metadata viapg_xact_commit_timestamp_origin(xmin). PostgreSQL documents that roident values are specific to each cluster'spg_replication_origincatalog—they differ across instances. Applying one node's roident mapping (e.g.,roident = '1'from the first pool) to other nodes will filter wrong rows and mislabelnode_originin diff output, particularly in native replication setups where roident IDs are never globally synchronized.This needs per-node origin resolution: either query each pool separately for its mappings, or translate roident to stable identifiers (like
roname) within each node's query before comparison and serialization.Also applies to: 253–256, 448–450; affects
table_rerun.goExecuteRerunTask path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/diff/table_diff.go` around lines 186 - 209, loadSpockNodeNames currently queries only the first pool and reuses that origin->name mapping across all nodes, which is incorrect for per-cluster roident values; change TableDiffTask.loadSpockNodeNames to iterate t.Pools and call queries.GetNodeOriginNames for each pool, storing the result keyed by pool identifier (e.g., pool name or connection string) into a per-pool mapping (replace t.SpockNodeNames map[string]string with map[string]map[string]string or similar), then update callers buildEffectiveFilter and fetchRows (and table_rerun.go's ExecuteRerunTask path) to look up the appropriate per-pool mapping when filtering or labeling using pg_xact_commit_timestamp_origin(xmin) so each node resolves roident->roname using its own pool's mapping.internal/consistency/repair/table_repair.go (1)
195-204:⚠️ Potential issue | 🔴 CriticalAdd
LOCALkeyword to makesession_replication_roletransaction-scoped.
SET session_replication_rolepersists afterCOMMITon the session and will leak into subsequent transactions when the connection is returned to the pool. UseSET LOCAL session_replication_roleinstead to ensure it reverts at transaction end and does not affect later repairs on the same pooled connection.Suggested fix
if t.FireTriggers { - _, err = tx.Exec(t.Ctx, "SET session_replication_role = 'local'") + _, err = tx.Exec(t.Ctx, "SET LOCAL session_replication_role = 'local'") } else { - _, err = tx.Exec(t.Ctx, "SET session_replication_role = 'replica'") + _, err = tx.Exec(t.Ctx, "SET LOCAL session_replication_role = 'replica'") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/repair/table_repair.go` around lines 195 - 204, The session_replication_role is being set with a non-transaction-scoped command; update the two tx.Exec calls in table_repair.go (the block using t.FireTriggers and tx.Exec) to use "SET LOCAL session_replication_role = 'local'" and "SET LOCAL session_replication_role = 'replica'" so the setting is transaction-scoped and will revert at transaction end; keep the existing error handling (returning fmt.Errorf(...) on err) and the logger.Debug call as-is (logger.Debug("session_replication_role set on %s (fire_triggers: %v)", nodeName, t.FireTriggers)).
🧹 Nitpick comments (5)
tests/integration/table_repair_test.go (3)
1239-1243: Multiple Spock-specific tests not refactored.
TestTableRepair_PreserveOriginand related tests (FixNulls_PreserveOrigin,Bidirectional_PreserveOrigin,MixedOps_PreserveOrigin) directly use Spock-specific features without thetestEnvabstraction. These tests should either be documented as Spock-only or have skip guards added for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 1239 - 1243, TestTableRepair_PreserveOrigin and its related tests (FixNulls_PreserveOrigin, Bidirectional_PreserveOrigin, MixedOps_PreserveOrigin) use Spock-specific features directly; update them to either be documented as Spock-only or add a skip guard using the test environment helper (e.g., call testEnv.RequireSpock() or t.SkipUnlessSpock() at the start of each test) so they only run when Spock is available; modify the beginning of functions TestTableRepair_PreserveOrigin, FixNulls_PreserveOrigin, Bidirectional_PreserveOrigin, and MixedOps_PreserveOrigin to invoke the shared testEnv skip/check helper (or add t.Skip with a clear message) before any Spock-specific setup.
5-5: Copyright year inconsistency.This file uses
2023 - 2025while other files in this PR use2023 - 2026. Consider updating for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` at line 5, Update the copyright header in the file to match the project's year range by changing "2023 - 2025" to "2023 - 2026" in the file header comment (the top-line copyright comment string).
573-750: Inconsistent refactoring: test still usespgClusterdirectly.
TestTableRepair_VariousDataTypesdirectly usespgCluster.Node1Pool,spock.repair_mode(), andspock.repset_add_table()without thetestEnvabstraction. This test will fail if run in a native PG environment.If this test is intentionally Spock-only (due to repset usage), consider adding a skip guard or documenting it. Otherwise, consider refactoring it to use
testEnvlike the other tests.♻️ Proposed fix: Add skip guard if Spock-only
func TestTableRepair_VariousDataTypes(t *testing.T) { + // This test requires Spock for repset functionality + env := newSpockEnv() + if !env.HasSpock { + t.Skip("Skipping: requires Spock extension") + } + ctx := context.Background()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 573 - 750, This test uses Spock-specific APIs (pgCluster.Node1Pool/Node2Pool, spock.repair_mode, spock.repset_add_table) but wasn’t guarded or refactored to testEnv; either add an explicit skip at the top of TestTableRepair_VariousDataTypes (e.g., if !testEnv.IsSpock() { t.Skip("spock-only test") }) or refactor the test to use testEnv helpers (replace pgCluster.Node1Pool/Node2Pool with testEnv-provided pools and replace direct spock SQL calls with testEnv helper methods for repset_add_table and repair_mode) so the test runs correctly in non-Spock environments.tests/integration/test_env_test.go (2)
202-202: Unused parametercompositeinsetupDivergence.The
compositeparameter is accepted but never used in the method body. Consider removing it if not needed, or document why it's retained for API consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_env_test.go` at line 202, The parameter `composite` on function `setupDivergence` is unused; remove it from the function signature and from all call sites (update any tests calling `setupDivergence(t, ctx, qualifiedTableName, composite)` to `setupDivergence(t, ctx, qualifiedTableName)`), or if you must preserve the API, explicitly mark it as used by adding a no-op reference (for example `_ = composite`) inside `setupDivergence` so linters/tests stop flagging it as unused; adjust imports/usages accordingly.
357-357: Hardcoded sleep may cause flaky tests or slow execution.The 2-second sleep before running the repair task lacks documentation explaining why it's needed. If it's working around a race condition, consider adding a comment. If it's no longer necessary, removing it would speed up tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_env_test.go` at line 357, Replace the hardcoded time.Sleep(2 * time.Second) in the test with a deterministic wait: either poll the expected condition (e.g., loop with time.Sleep(100*time.Millisecond) and a timeout checking the repair task's observable effect or state) until success, or, if the delay is truly required, add a clear comment above the time.Sleep explaining why the pause is necessary and reference the specific repair invocation being synchronized with; ensure any polling uses a reasonable timeout and fails the test on expiry rather than hanging indefinitely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@db/queries/queries.go`:
- Around line 989-999: GetNodeOriginNames currently chooses GetSpockNodeNames
based on the spock extension existing, which mismatches replication origin OIDs;
change it to detect the ID source by querying pg_replication_origin and deciding
the translator based on whether its roident values map to spock.node.node_id or
to pg_replication_origin identifiers. Concretely: replace the extname check with
a lightweight query that inspects pg_replication_origin (optionally joining
against spock.node) to see if roident corresponds to spock.node.node_id (use a
join like pg_replication_origin.roident::text = spock.node.node_id::text or
existence of matching rows); if that test indicates spock node IDs are the
source, call GetSpockNodeNames(ctx, db), otherwise call
GetReplicationOriginNames(ctx, db). Ensure you retain error handling from the
original GetNodeOriginNames.
In `@db/queries/templates.go`:
- Around line 1325-1332: The Templates struct is missing the CompareBlocksSQL
field referenced by the struct literal; add a field named CompareBlocksSQL of
type *template.Template to the Templates struct definition so the struct literal
that sets CompareBlocksSQL compiles (ensure the field name exactly matches
CompareBlocksSQL and the type matches other template fields like those around
native LSN templates).
- Around line 1551-1568: The LIKE '%' || $1 || '%' substring matching in the
templates GetNativeOriginLSNForNode and GetNativeSlotLSNForNode can return wrong
subscriptions when node names are substrings of each other; update both WHERE
clauses to use exact matching (s.subname = $1) if subscription naming permits,
otherwise replace the LIKE with a stricter pattern that enforces boundaries (for
example a regex match using s.subname ~ ('\m' || $1 || '\M') or a LIKE that
includes explicit delimiters present in your subscription names) so the query
selects the intended subscription rather than any substring match.
In `@internal/consistency/diff/repset_diff.go`:
- Around line 158-169: The spock extension check should run for every node
before proceeding to repset discovery: remove the conditional that gates
queries.CheckSpockInstalled on len(repsetNodeNames) and call
queries.CheckSpockInstalled(c.Ctx, pool) for each node (using nodeName and
pool), closing pool and returning a formatted error on failure; if spock is not
installed on that node return an error that includes nodeName (instead of
deferring to CheckRepSetExists), then continue to CheckRepSetExists only after
the per-node spock check succeeds.
In `@internal/consistency/diff/spock_diff.go`:
- Around line 310-320: The loop over t.Pools only validates the first pool then
breaks, so some nodes may skip the spock check and fail later in
GetSpockNodeAndSubInfo; remove the premature break and instead iterate every
pool in t.Pools calling queries.CheckSpockInstalled(t.Ctx, pool), and if it
returns false return an error that includes the pool/node identifier (e.g.
pool.Name or pool.Node) so the message pinpoints the specific node missing the
spock extension; keep the same error wrapping for query failures from
CheckSpockInstalled.
In `@internal/consistency/repair/table_repair.go`:
- Around line 756-759: The code detects Spock only after validation which causes
fetchLSNsForNode()/autoSelectSourceOfTruth() to assume spockAvailable=false;
move detection earlier so spockAvailable is set once t.Pools exists or make
fetching lazy. Concretely, call t.detectSpock() immediately after t.Pools is
populated (before calling ValidateAndPrepare()) or add a lazy guard inside
fetchLSNsForNode() that invokes detectSpock() the first time it runs (and
respects t.DryRun), ensuring detectSpock() sets spockAvailable before any
LSN-fetching logic runs.
- Around line 165-180: detectSpock currently returns after the first successful
CheckSpockInstalled and uses map iteration order to set t.spockAvailable; change
it to check every entry in t.Pools (calling queries.CheckSpockInstalled for
each), record whether any node reports spockInstalled==true and any reports
false, and if both true and false are observed fail fast by returning an error
(do not silently pick the first result); otherwise set t.spockAvailable to the
unanimous boolean. Change detectSpock to return error and update its callers to
handle that error; keep logging for per-node failures but do not treat the first
successful probe as definitive.
In `@tests/integration/docker-compose-native.yaml`:
- Around line 28-30: Remove the fixed host port bindings so Docker can allocate
ephemeral host ports: in the docker-compose service definitions that currently
include "ports: - target: 5432 published: 7432" (and the similar block at
43-45), delete the "published: <port>" lines (or replace the full mapping with a
single target-only mapping) so only the container port is exposed and the tests
rely on MappedPort() in tests/integration/native_pg_test.go to discover the host
port at runtime.
In `@tests/integration/merkle_tree_test.go`:
- Around line 476-480: The goroutine attaches the CDC listener to
env.ClusterNodes[0] which can be in arbitrary order; instead find the node
matching env.ServiceN1 and pass that to cdc.ListenForChanges. Replace the
hardcoded env.ClusterNodes[0] lookup with a search over env.ClusterNodes for the
node whose name/service equals env.ServiceN1 (assign to nodeInfo), then call
cdc.ListenForChanges(ctx, nodeInfo) and keep defer wg.Done() as-is so the
listener watches the intended service.
---
Outside diff comments:
In `@internal/consistency/diff/table_diff.go`:
- Around line 186-209: loadSpockNodeNames currently queries only the first pool
and reuses that origin->name mapping across all nodes, which is incorrect for
per-cluster roident values; change TableDiffTask.loadSpockNodeNames to iterate
t.Pools and call queries.GetNodeOriginNames for each pool, storing the result
keyed by pool identifier (e.g., pool name or connection string) into a per-pool
mapping (replace t.SpockNodeNames map[string]string with
map[string]map[string]string or similar), then update callers
buildEffectiveFilter and fetchRows (and table_rerun.go's ExecuteRerunTask path)
to look up the appropriate per-pool mapping when filtering or labeling using
pg_xact_commit_timestamp_origin(xmin) so each node resolves roident->roname
using its own pool's mapping.
In `@internal/consistency/mtree/merkle.go`:
- Around line 501-520: loadSpockNodeNames currently stops after the first
successful node and sets a single SpockNodeNames map which is then (incorrectly)
used for all nodes; change the task to store per-node origin mappings (e.g.
SpockNodeNames map[string]map[uint32]string keyed by node identifier), have
loadSpockNodeNames populate the map for each ClusterNodes entry (do not return
after first success and cache failures per-node), and update places that call
TranslateNodeOrigin (and the code paths that use the raw roident produced by
buildFetchRowsSQL* functions) to look up the origin name from the per-node map
for that specific node. Ensure connections are closed on every iteration and
errors are collected/returned appropriately if a node’s mapping cannot be
loaded.
In `@internal/consistency/repair/table_repair.go`:
- Around line 195-204: The session_replication_role is being set with a
non-transaction-scoped command; update the two tx.Exec calls in table_repair.go
(the block using t.FireTriggers and tx.Exec) to use "SET LOCAL
session_replication_role = 'local'" and "SET LOCAL session_replication_role =
'replica'" so the setting is transaction-scoped and will revert at transaction
end; keep the existing error handling (returning fmt.Errorf(...) on err) and the
logger.Debug call as-is (logger.Debug("session_replication_role set on %s
(fire_triggers: %v)", nodeName, t.FireTriggers)).
---
Nitpick comments:
In `@tests/integration/table_repair_test.go`:
- Around line 1239-1243: TestTableRepair_PreserveOrigin and its related tests
(FixNulls_PreserveOrigin, Bidirectional_PreserveOrigin, MixedOps_PreserveOrigin)
use Spock-specific features directly; update them to either be documented as
Spock-only or add a skip guard using the test environment helper (e.g., call
testEnv.RequireSpock() or t.SkipUnlessSpock() at the start of each test) so they
only run when Spock is available; modify the beginning of functions
TestTableRepair_PreserveOrigin, FixNulls_PreserveOrigin,
Bidirectional_PreserveOrigin, and MixedOps_PreserveOrigin to invoke the shared
testEnv skip/check helper (or add t.Skip with a clear message) before any
Spock-specific setup.
- Line 5: Update the copyright header in the file to match the project's year
range by changing "2023 - 2025" to "2023 - 2026" in the file header comment (the
top-line copyright comment string).
- Around line 573-750: This test uses Spock-specific APIs
(pgCluster.Node1Pool/Node2Pool, spock.repair_mode, spock.repset_add_table) but
wasn’t guarded or refactored to testEnv; either add an explicit skip at the top
of TestTableRepair_VariousDataTypes (e.g., if !testEnv.IsSpock() {
t.Skip("spock-only test") }) or refactor the test to use testEnv helpers
(replace pgCluster.Node1Pool/Node2Pool with testEnv-provided pools and replace
direct spock SQL calls with testEnv helper methods for repset_add_table and
repair_mode) so the test runs correctly in non-Spock environments.
In `@tests/integration/test_env_test.go`:
- Line 202: The parameter `composite` on function `setupDivergence` is unused;
remove it from the function signature and from all call sites (update any tests
calling `setupDivergence(t, ctx, qualifiedTableName, composite)` to
`setupDivergence(t, ctx, qualifiedTableName)`), or if you must preserve the API,
explicitly mark it as used by adding a no-op reference (for example `_ =
composite`) inside `setupDivergence` so linters/tests stop flagging it as
unused; adjust imports/usages accordingly.
- Line 357: Replace the hardcoded time.Sleep(2 * time.Second) in the test with a
deterministic wait: either poll the expected condition (e.g., loop with
time.Sleep(100*time.Millisecond) and a timeout checking the repair task's
observable effect or state) until success, or, if the delay is truly required,
add a clear comment above the time.Sleep explaining why the pause is necessary
and reference the specific repair invocation being synchronized with; ensure any
polling uses a reasonable timeout and fails the test on expiry rather than
hanging indefinitely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1333b0b9-2f32-4543-bbbc-6dd36810011a
📒 Files selected for processing (16)
db/queries/queries.godb/queries/templates.gointernal/consistency/diff/repset_diff.gointernal/consistency/diff/spock_diff.gointernal/consistency/diff/table_diff.gointernal/consistency/diff/table_rerun.gointernal/consistency/mtree/merkle.gointernal/consistency/repair/table_repair.gotests/integration/docker-compose-native.yamltests/integration/helpers_test.gotests/integration/main_test.gotests/integration/merkle_tree_test.gotests/integration/native_pg_test.gotests/integration/table_diff_test.gotests/integration/table_repair_test.gotests/integration/test_env_test.go
Auto-detect whether spock is installed and branch accordingly across all repair, diff, rerun, and merkle code paths: - Add detectSpock() to table repair; conditionally call spock.repair_mode() only when spock is present, fall back to session_replication_role alone - Replace spock.xact_commit_timestamp_origin() with native PG14+ pg_xact_commit_timestamp_origin() in diff, rerun, and merkle queries - Add GetNodeOriginNames() wrapper that uses spock.node when available, falls back to pg_replication_origin for node name resolution - Add native PG alternatives for LSN queries (pg_subscription-based) - Add CheckSpockInstalled() utility; spock-diff and repset-diff now return a clear error when spock is not installed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tive PG Introduce a testEnv struct that encapsulates environment-specific state (pools, service names, cluster config, HasSpock flag), allowing the same test logic to run against both spock-replicated and vanilla PostgreSQL clusters with zero code duplication. Refactor 17 test functions (9 repair, 8 diff) to accept *testEnv. Add docker-compose-native.yaml with two postgres:17 containers and a TestNativePG suite that exercises all applicable shared tests (40 subtests) against standalone PG nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded `spock.` schema prefix with a template function
`{{aceSchema}}` that reads from config at render time. Defaults to
"spock" for backward compatibility but allows alternate schemas
(e.g. "ace") via the existing mtree.schema config key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor all merkle tree test functions to accept *testEnv, replacing pgCluster globals and spock.repair_mode() calls with env-aware abstractions. All 28 merkle tree subtests (Init, Build, Diff, Merge, Split, CDC, Teardown, NumericScaleInvariance) now run on both spock and native PostgreSQL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…G queries - Use regex word boundaries (\m/\M) instead of LIKE substring matching to prevent node name collisions (e.g., n1 vs n10) - Check spock extension on every node instead of just the first - Make spock detection per-node and lazy, supporting mixed clusters and fixing early-access bug in recovery mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, each child table-diff opens its own SQLite connection instead of sharing the parent's handle. Matches CloneForSchedule(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The field and methods are used on both Spock and native PG paths, so the "Spock" prefix was misleading. Also fixes --against-origin error messages to list available origin IDs/names instead of referencing "spock node id". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
isSpockAvailable reads from t.Pools[nodeName], but the pool was stored after fetchLSNsForNode returned. This caused Spock clusters to silently use native PG LSN queries during recovery-mode auto-selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.Cfg is only safe during single-threaded startup. The aceSchema template function is evaluated at render time from concurrent goroutines, so it must use config.Get() which holds a read lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refactored testTableDiff_VariousDataTypes, testTableDiff_UUIDColumn, testTableDiff_ByteaColumnSizeCheck, testTableRepair_VariousDataTypes, testTableRepair_TimestampAndTimeTypes, and testTableRepair_LargeBigintPK to accept *testEnv. Repset calls are conditional on HasSpock, repair_mode uses env.withRepairModeTx. All 6 now run in TestNativePG alongside the existing shared tests. Also adds TestNativePG to CI workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
79c57e8 to
fd57b45
Compare
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 1 medium |
🟢 Metrics 95 complexity · 20 duplication
Metric Results Complexity 95 Duplication 20
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tests/integration/merkle_tree_test.go (1)
897-903:⚠️ Potential issue | 🟡 MinorClose this ad-hoc pool or reuse
env.N2Pool.This allocates a second pgx pool for every split test and never closes it. Over the full integration suite that leaks connections and can make later tests flaky.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/merkle_tree_test.go` around lines 897 - 903, The test opens an ad-hoc pgx pool via connectToNode into the local variable pool and never closes it; either reuse the existing env.N2Pool or ensure the new pool is closed after use to avoid leaking connections. Replace the connectToNode call with env.N2Pool when reading leafNodeCountAfter (or, if a separate pool is required, call pool.Close() in a defer immediately after creation) and keep the reference to leafNodeCountAfter and the QueryRow/Scan on the chosen pool (look for variables/functions: leafNodeCountAfter, connectToNode, pool, env.N2Pool).internal/consistency/repair/table_repair.go (1)
2921-2953:⚠️ Potential issue | 🟠 MajorPropagate native LSN probe failures instead of treating them as missing data.
GetNativeOriginLSNForNode()andGetNativeSlotLSNForNode()already turn “not found” intonil. Any error that reaches this branch is a real probe failure; converting it tonillets recovery mode ignore a survivor and can auto-select the wrongsource_of_truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/repair/table_repair.go` around lines 2921 - 2953, The code incorrectly treats real errors from GetNativeOriginLSNForNode and GetNativeSlotLSNForNode as “no data” by logging a warning and returning nils; instead, when isSpockAvailable(t) is false and these native probe calls return a non-nil err, propagate/wrap and return that error so caller can handle a probe failure. Update the error branches in the originLSN and slotLSN sections (the blocks that call GetNativeOriginLSNForNode and GetNativeSlotLSNForNode) to return a wrapped fmt.Errorf(...) error (similar to the spock branch) rather than returning nil values, keeping the existing log only if you still want a warning before returning the error.tests/integration/table_repair_test.go (3)
908-921:⚠️ Potential issue | 🟡 MinorSame inconsistency - uses global
testSchemainstead ofenv.Schema.This function has the same issue as
testTableRepair_VariousDataTypes. ReplacetestSchemawithenv.Schemafor consistency with the environment abstraction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 908 - 921, In testTableRepair_TimestampAndTimeTypes, the test uses the global testSchema instead of the environment-specific env.Schema; update qualifiedTableName and the CREATE statement (the createSQL string) to use env.Schema everywhere (replace occurrences of testSchema with env.Schema) so the test respects the testEnv abstraction and matches testTableRepair_VariousDataTypes.
2093-2114:⚠️ Potential issue | 🟡 MinorSame inconsistency - uses global
testSchemainstead ofenv.Schema.Third instance of the schema inconsistency. Replace
testSchemawithenv.Schemathroughout this function.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 2093 - 2114, In testTableRepair_LargeBigintPK, replace uses of the global testSchema with the per-test env.Schema (e.g., in qualifiedTableName, createTableSQL, and any other occurrences such as schema-qualified strings for CREATE SCHEMA/CREATE TABLE and fmt.Sprintf calls) so the function uses env.Schema consistently; update the fmt.Sprintf arguments and any string literals that refer to "%s" schema to use env.Schema and ensure collisionPKs and other variables remain unchanged.
573-602:⚠️ Potential issue | 🟡 MinorInconsistent schema usage - uses global
testSchemainstead ofenv.Schema.This function uses the global
testSchemavariable while other refactored test functions useenv.Schema. This breaks the environment abstraction and would cause issues if native PG tests use a different schema.🔧 Replace testSchema with env.Schema
func testTableRepair_VariousDataTypes(t *testing.T, env *testEnv) { ctx := context.Background() tableName := "data_type_repair" - qualifiedTableName := fmt.Sprintf("%s.%s", testSchema, tableName) + qualifiedTableName := fmt.Sprintf("%s.%s", env.Schema, tableName) refTime := time.Date(2024, 1, 15, 10, 30, 0, 0, time.UTC) createDataTypeTableSQL := fmt.Sprintf(` -CREATE SCHEMA IF NOT EXISTS "%s"; -CREATE TABLE IF NOT EXISTS %s.%s ( +CREATE SCHEMA IF NOT EXISTS "%s"; +CREATE TABLE IF NOT EXISTS %s.%s ( ... -);`, testSchema, testSchema, tableName) +);`, env.Schema, env.Schema, tableName)Apply similar changes throughout the function.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 573 - 602, The test uses the global testSchema instead of the environment-specific env.Schema which breaks isolation; update all references in testTableRepair_VariousDataTypes (e.g., qualifiedTableName, createDataTypeTableSQL and any fmt.Sprintf schema interpolation) to use env.Schema, including the CREATE SCHEMA clause and any qualified table names and format arguments, ensuring the fmt.Sprintf parameter list matches the replaced env.Schema usage throughout the function.
🧹 Nitpick comments (2)
tests/integration/test_env_test.go (2)
313-325: Consider usingrequire.Emptyfor stronger assertion.Using
assert.Emptyallows the test to continue even when diffs are found. For a function namedassertNoTableDiff, a terminal failure viarequire.Emptymay be more appropriate to prevent misleading cascading failures.💡 Suggested change
- assert.Empty(t, tdTask.DiffResult.NodeDiffs, "Expected no differences after repair, but diffs were found") + require.Empty(t, tdTask.DiffResult.NodeDiffs, "Expected no differences after repair, but diffs were found")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_env_test.go` around lines 313 - 325, The test uses assert.Empty which allows continuation on failure; change the final assertion in assertNoTableDiff to use require.Empty so the test aborts immediately if tdTask.DiffResult.NodeDiffs is non-empty; locate the assertNoTableDiff function and replace the assert.Empty(t, tdTask.DiffResult.NodeDiffs, ...) call with require.Empty(t, tdTask.DiffResult.NodeDiffs, ...) preserving the existing message context.
355-358: Document the purpose of this sleep.The 2-second sleep before running the repair lacks explanation. If it's needed for replication lag, file system sync, or transaction visibility, document the reason. Unexplained sleeps are difficult to maintain and tune.
💬 Add a comment explaining the delay
+ // Wait for [reason: e.g., replication lag / transaction visibility / etc.] time.Sleep(2 * time.Second) if err := repairTask.Run(false); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_env_test.go` around lines 355 - 358, Add a brief inline comment above the sleep explaining why the 2-second delay is required (e.g., to wait for replication lag, file system flush, transaction visibility for latestDiffFile, or propagation from sourceOfTruthNode) and note any assumptions about timing and how to adjust it; reference the related operations newTableRepairTask(...) and repairTask.Run(false) in the comment and, if applicable, mention a preferred replacement (polling for a condition) to avoid a fixed sleep.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/consistency/repair/table_repair.go`:
- Around line 165-185: The isSpockAvailable function currently swallows probe
errors and returns false; change its signature to isSpockAvailable(nodeName
string) (bool, error), call queries.CheckSpockInstalled(t.Ctx, pool) and if it
returns an error return (false, err) without caching, only cache the boolean
result into t.spockPerNode[nodeName] when the probe succeeds, update the logger
calls to log the error before returning, and then update all callers of
isSpockAvailable to handle the (bool, error) result instead of a plain bool.
In `@tests/integration/cdc_busy_table_test.go`:
- Line 195: The test directly interpolates config.Cfg.MTree.Schema into SQL (see
the Exec call via pgCluster.Node1Pool.Exec), which breaks for schemas needing
quoting; build the identifier using pgx.Identifier with config.Cfg.MTree.Schema
and use the identifier's properly quoted SQL representation when formatting the
query string instead of splicing the raw value; apply the same change to the
other occurrence (line 266) so both queries safely handle configurable schema
names.
In `@tests/integration/test_env_test.go`:
- Around line 202-257: The function setupDivergence currently declares an unused
parameter composite (bool); remove the dead parameter from the function
signature and all its callers (e.g., calls that pass a composite argument) OR,
if composite is required, implement the missing branch that applies
composite-key-specific behavior inside setupDivergence (use the composite flag
to adjust INSERT/UPDATE statements or column lists). Update references to the
function (call sites) to match the new signature if you remove the parameter, or
add the composite-aware logic where setupDivergence uses composite to change SQL
formatting or column handling.
---
Outside diff comments:
In `@internal/consistency/repair/table_repair.go`:
- Around line 2921-2953: The code incorrectly treats real errors from
GetNativeOriginLSNForNode and GetNativeSlotLSNForNode as “no data” by logging a
warning and returning nils; instead, when isSpockAvailable(t) is false and these
native probe calls return a non-nil err, propagate/wrap and return that error so
caller can handle a probe failure. Update the error branches in the originLSN
and slotLSN sections (the blocks that call GetNativeOriginLSNForNode and
GetNativeSlotLSNForNode) to return a wrapped fmt.Errorf(...) error (similar to
the spock branch) rather than returning nil values, keeping the existing log
only if you still want a warning before returning the error.
In `@tests/integration/merkle_tree_test.go`:
- Around line 897-903: The test opens an ad-hoc pgx pool via connectToNode into
the local variable pool and never closes it; either reuse the existing
env.N2Pool or ensure the new pool is closed after use to avoid leaking
connections. Replace the connectToNode call with env.N2Pool when reading
leafNodeCountAfter (or, if a separate pool is required, call pool.Close() in a
defer immediately after creation) and keep the reference to leafNodeCountAfter
and the QueryRow/Scan on the chosen pool (look for variables/functions:
leafNodeCountAfter, connectToNode, pool, env.N2Pool).
In `@tests/integration/table_repair_test.go`:
- Around line 908-921: In testTableRepair_TimestampAndTimeTypes, the test uses
the global testSchema instead of the environment-specific env.Schema; update
qualifiedTableName and the CREATE statement (the createSQL string) to use
env.Schema everywhere (replace occurrences of testSchema with env.Schema) so the
test respects the testEnv abstraction and matches
testTableRepair_VariousDataTypes.
- Around line 2093-2114: In testTableRepair_LargeBigintPK, replace uses of the
global testSchema with the per-test env.Schema (e.g., in qualifiedTableName,
createTableSQL, and any other occurrences such as schema-qualified strings for
CREATE SCHEMA/CREATE TABLE and fmt.Sprintf calls) so the function uses
env.Schema consistently; update the fmt.Sprintf arguments and any string
literals that refer to "%s" schema to use env.Schema and ensure collisionPKs and
other variables remain unchanged.
- Around line 573-602: The test uses the global testSchema instead of the
environment-specific env.Schema which breaks isolation; update all references in
testTableRepair_VariousDataTypes (e.g., qualifiedTableName,
createDataTypeTableSQL and any fmt.Sprintf schema interpolation) to use
env.Schema, including the CREATE SCHEMA clause and any qualified table names and
format arguments, ensuring the fmt.Sprintf parameter list matches the replaced
env.Schema usage throughout the function.
---
Nitpick comments:
In `@tests/integration/test_env_test.go`:
- Around line 313-325: The test uses assert.Empty which allows continuation on
failure; change the final assertion in assertNoTableDiff to use require.Empty so
the test aborts immediately if tdTask.DiffResult.NodeDiffs is non-empty; locate
the assertNoTableDiff function and replace the assert.Empty(t,
tdTask.DiffResult.NodeDiffs, ...) call with require.Empty(t,
tdTask.DiffResult.NodeDiffs, ...) preserving the existing message context.
- Around line 355-358: Add a brief inline comment above the sleep explaining why
the 2-second delay is required (e.g., to wait for replication lag, file system
flush, transaction visibility for latestDiffFile, or propagation from
sourceOfTruthNode) and note any assumptions about timing and how to adjust it;
reference the related operations newTableRepairTask(...) and
repairTask.Run(false) in the comment and, if applicable, mention a preferred
replacement (polling for a condition) to avoid a fixed sleep.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 06f84ca0-993c-41e0-819e-fd8dddb46acc
📒 Files selected for processing (18)
.github/workflows/test.ymldb/queries/queries.godb/queries/templates.gointernal/consistency/diff/repset_diff.gointernal/consistency/diff/spock_diff.gointernal/consistency/diff/table_diff.gointernal/consistency/diff/table_rerun.gointernal/consistency/mtree/merkle.gointernal/consistency/repair/table_repair.gotests/integration/cdc_busy_table_test.gotests/integration/docker-compose-native.yamltests/integration/helpers_test.gotests/integration/main_test.gotests/integration/merkle_tree_test.gotests/integration/native_pg_test.gotests/integration/table_diff_test.gotests/integration/table_repair_test.gotests/integration/test_env_test.go
✅ Files skipped from review due to trivial changes (2)
- .github/workflows/test.yml
- tests/integration/docker-compose-native.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/consistency/diff/table_rerun.go
- internal/consistency/mtree/merkle.go
- internal/consistency/diff/table_diff.go
- internal/consistency/diff/repset_diff.go
- db/queries/queries.go
- db/queries/templates.go
…aram isSpockAvailable now returns (bool, error) so callers don't silently fall back to native PG mode on detection failures. Also removes the unused composite parameter from setupDivergence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/table_repair_test.go (1)
1253-1268:⚠️ Potential issue | 🟡 MinorPreserve-origin tests are Spock-only but lack skip guard.
The
TestTableRepair_PreserveOrigin,TestTableRepair_FixNulls_PreserveOrigin,TestTableRepair_Bidirectional_PreserveOrigin, andTestTableRepair_MixedOps_PreserveOrigintests use unconditional Spock calls (spock.repair_mode,spock.repset_add_table) and require 3 nodes (N3Pool). These will fail on native PostgreSQL test runs.Consider adding a skip guard at the beginning of these tests:
func TestTableRepair_PreserveOrigin(t *testing.T) { if pgCluster.Node3Pool == nil { t.Skip("PreserveOrigin tests require 3-node Spock cluster") } // ... rest of test }Or refactor to use the
testEnvpattern with appropriateHasSpockchecks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/table_repair_test.go` around lines 1253 - 1268, Add a skip guard to the start of the Spock-only tests (TestTableRepair_PreserveOrigin, TestTableRepair_FixNulls_PreserveOrigin, TestTableRepair_Bidirectional_PreserveOrigin, TestTableRepair_MixedOps_PreserveOrigin) so they don't run against native Postgres; check pgCluster.Node3Pool != nil (or use testEnv.HasSpock if available) and call t.Skip with a clear message when the check fails, since these tests call Spock functions like spock.repair_mode and spock.repset_add_table and require a 3-node cluster.
🧹 Nitpick comments (1)
internal/consistency/repair/table_repair.go (1)
2051-2063: Minor inconsistency: ignoringspockRepairModeActivereturn value.Unlike other call sites (e.g.,
runFixNulls,runUnidirectionalRepair) that capturespockRepairModeActiveto gate the disable call, this path discards it and relies ondisableSpockRepairMode's internal re-check viaisSpockAvailable. This works correctly due to the caching, but is slightly inconsistent with the pattern elsewhere.Consider capturing the return value for consistency:
-if _, err := t.setupTransactionMode(tx, nodeName); err != nil { +if spockActive, err := t.setupTransactionMode(tx, nodeName); err != nil { return 0, err } ... -if err := t.disableSpockRepairMode(tx, nodeName); err != nil { +if spockActive { + if err := t.disableSpockRepairMode(tx, nodeName); err != nil { + return 0, err + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/consistency/repair/table_repair.go` around lines 2051 - 2063, The code calls setupTransactionMode and executeUpserts then calls disableSpockRepairMode but ignores the spockRepairModeActive flag used elsewhere; change the path so setupTransactionMode(tx, nodeName) returns and you capture the spockRepairModeActive boolean (e.g., spockRepairModeActive, err := t.setupTransactionMode(...)), then pass that boolean to the subsequent disable call (e.g., if spockRepairModeActive { if err := t.disableSpockRepairMode(tx, nodeName); err != nil { return 0, err } }) so the disable is gated consistently with other call sites (refer to setupTransactionMode, spockRepairModeActive, disableSpockRepairMode, executeUpserts).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/integration/table_repair_test.go`:
- Around line 1253-1268: Add a skip guard to the start of the Spock-only tests
(TestTableRepair_PreserveOrigin, TestTableRepair_FixNulls_PreserveOrigin,
TestTableRepair_Bidirectional_PreserveOrigin,
TestTableRepair_MixedOps_PreserveOrigin) so they don't run against native
Postgres; check pgCluster.Node3Pool != nil (or use testEnv.HasSpock if
available) and call t.Skip with a clear message when the check fails, since
these tests call Spock functions like spock.repair_mode and
spock.repset_add_table and require a 3-node cluster.
---
Nitpick comments:
In `@internal/consistency/repair/table_repair.go`:
- Around line 2051-2063: The code calls setupTransactionMode and executeUpserts
then calls disableSpockRepairMode but ignores the spockRepairModeActive flag
used elsewhere; change the path so setupTransactionMode(tx, nodeName) returns
and you capture the spockRepairModeActive boolean (e.g., spockRepairModeActive,
err := t.setupTransactionMode(...)), then pass that boolean to the subsequent
disable call (e.g., if spockRepairModeActive { if err :=
t.disableSpockRepairMode(tx, nodeName); err != nil { return 0, err } }) so the
disable is gated consistently with other call sites (refer to
setupTransactionMode, spockRepairModeActive, disableSpockRepairMode,
executeUpserts).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 667f54d6-2c8a-4781-996c-b8f681e7ef93
📒 Files selected for processing (4)
internal/consistency/repair/table_repair.gotests/integration/advanced_repair_test.gotests/integration/table_repair_test.gotests/integration/test_env_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration/test_env_test.go
We previously required spock in order to run comparisons. With this PR, schema and table diff/repair can be used with vanilla Postgres.