Skip to content

[SPARK-55993][SS][TEST] Fix flaky RocksDBStateStoreIntegrationSuite bounded memory test#54808

Open
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-55993
Open

[SPARK-55993][SS][TEST] Fix flaky RocksDBStateStoreIntegrationSuite bounded memory test#54808
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-55993

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

Fix flaky test bounded memory usage calculation in RocksDBStateStoreIntegrationSuite.

The test asserts RocksDBMemoryManager.getNumRocksDBInstances(false) == 0 (no unbounded instances) immediately after processAllAvailable(). However, RocksDBMemoryManager is a global singleton (object), and the streaming query may transiently register unbounded instances during state store initialization before the bounded-memory config takes full effect.

Changes:

  • Remove fragile pre-query assertion (getNumRocksDBInstances(true) == 0) that can fail if previous tests did not fully clean up the global singleton
  • Wrap post-query instance count assertions in eventually{} to tolerate transient registration during state store initialization
  • Same logical assertions (2 bounded, 0 unbounded) but with retry tolerance

Why are the changes needed?

The test fails intermittently in CI with 1 did not equal 0 at line 423, blocking unrelated PRs.

Does this PR introduce any user-facing change?

No — test-only change.

How was this patch tested?

Compilation verified. The fix addresses the root cause (global singleton race) by adding retry tolerance.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

…ounded memory test

The 'bounded memory usage calculation' test was flaky due to asserting
global RocksDBMemoryManager singleton state without accounting for
transient registration during streaming query initialization.

Changes:
- Remove fragile pre-query assertion (getNumRocksDBInstances(true) == 0)
  that can fail if previous tests didn't fully clean up
- Wrap post-query instance count assertions in eventually{} to tolerate
  transient unbounded instances during state store initialization
- Keep the same logical assertions (2 bounded, 0 unbounded) but allow
  brief transient state to settle

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant