Fix incorrect measurement schema during compaction#17297
Fix incorrect measurement schema during compaction#17297
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses incorrect/unstable measurement schema (notably data type) selection during compaction by making compaction choose the “latest” schema (from schema engine / table cache) when source TsFiles contain conflicting schemas, and by expanding/refactoring compaction tests to cover these scenarios.
Changes:
- Update compaction executors/iterators to detect data type mismatches across files and reconcile schemas using the latest schema (tree model) or table cache (table model).
- Add a test-only schema fetcher injection path and new/refactored compaction tests (including parameterized coverage across performer types).
- Fix table schema copy behavior in
TsTable(copy tag column index map rather than a removed/incorrect map).
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/schema/table/TsTable.java | Fix copy-constructor to copy tag column index map; remove unused/incorrect map. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/utils/CompactionFakeSchemaFetcherImpl.java | Add schema fetcher test helper exposing schema tree. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/tablemodel/TableModelCompactionWithTTLTest.java | Use centralized performer selection helper. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchTest.java | Move into alterDataType package; parameterize by performer; reuse shared helpers. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchAlterableDataTypeTest.java | Refactor to shared base; reuse file-generation helpers. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeAlterTest.java | New parameterized alter-data-type compaction test for tree model. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeAlterTableTest.java | New parameterized alter-data-type compaction test for table model using DataNodeTableCache. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/AbstractCompactionAlterDataTypeTest.java | New shared base wiring a test schema fetcher + reusable TsFile generators. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/CompactionDataTypeAlterTest.java | Remove old non-parameterized test (replaced by new alterDataType package tests). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/AbstractCompactionTest.java | Add shared getPerformer(String) helper for tests. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/analyze/FakeSchemaFetcherImpl.java | Make schema tree and generator overridable (protected) for test specialization. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/readchunk/ReadChunkAlignedSeriesCompactionExecutor.java | Choose schema collection strategy for table vs tree model; reconcile schema for tree model using latest schema when needed. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/fast/FastNonAlignedSeriesCompactionExecutor.java | Add “can alter” validation before rewriting chunk types. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/batch/BatchedReadChunkAlignedSeriesCompactionExecutor.java | Thread database name through to aligned executor for table model schema lookup. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/MultiTsFileDeviceIterator.java | Add database accessor; add schema reconciliation via latest schema (tree) and table cache (table); propagate corrected final types. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionUtils.java | Add test-only schema fetcher injection and helper to fetch latest measurement schemas for tree model. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionSeriesContext.java | Track whether series needs type update and allow setting final type explicitly. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/performer/impl/ReadChunkCompactionPerformer.java | Pass database into aligned executor; reconcile non-aligned chunk types using latest schema (tree). |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/performer/impl/FastCompactionPerformer.java | Simplify non-aligned compaction setup to use updated iterator context map. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| } | ||
|
|
||
| @After | ||
| public void tearDown() throws IOException, StorageEngineException { |
| TsTableColumnSchema schemaInTsTable = | ||
| tsTable.getColumnSchema(chunkMetadata.getMeasurementUid()); |
| if (measurementSchema == null && !timeseriesMetadata.getChunkMetadataList().isEmpty()) { | ||
| measurementSchema = | ||
| reader.getMeasurementSchema(timeseriesMetadata.getChunkMetadataList()); | ||
| } |
| // find correct data type | ||
| TSDataType correctDataType = null; | ||
| for (int i = readerAndChunkMetadataList.size() - 1; i >= 0 && correctDataType == null; i--) { | ||
| List<ChunkMetadata> chunkMetadataList = readerAndChunkMetadataList.get(i).getRight(); | ||
| if (chunkMetadataList == null || chunkMetadataList.isEmpty()) { | ||
| continue; | ||
| } | ||
| boolean hasDifferentDataTypes = false; | ||
| Iterator<Pair<TsFileSequenceReader, List<ChunkMetadata>>> descIterator = | ||
| readerAndChunkMetadataList.descendingIterator(); | ||
| while (descIterator.hasNext()) { | ||
| Pair<TsFileSequenceReader, List<ChunkMetadata>> pair = descIterator.next(); | ||
| List<ChunkMetadata> chunkMetadataList = pair.right; | ||
| TSDataType dataTypeInCurrentFile = null; | ||
| for (ChunkMetadata chunkMetadata : chunkMetadataList) { | ||
| if (chunkMetadata == null) { | ||
| continue; | ||
| if (chunkMetadata != null) { | ||
| dataTypeInCurrentFile = chunkMetadata.getDataType(); | ||
| break; | ||
| } | ||
| correctDataType = chunkMetadata.getDataType(); | ||
| } | ||
| if (correctDataType == null) { | ||
| correctDataType = dataTypeInCurrentFile; | ||
| } else if (correctDataType != dataTypeInCurrentFile) { | ||
| hasDifferentDataTypes = true; | ||
| break; | ||
| } | ||
| } | ||
| if (correctDataType == null) { | ||
| if (!hasDifferentDataTypes) { | ||
| return readerAndChunkMetadataList; | ||
| } | ||
|
|
||
| IMeasurementSchema schema = | ||
| CompactionUtils.getLatestMeasurementSchemasForTreeModel( | ||
| deviceID, Collections.singletonList(measurement)) | ||
| .get(0); | ||
| if (schema != null) { | ||
| correctDataType = schema.getType(); | ||
| } | ||
|
|
||
| LinkedList<Pair<TsFileSequenceReader, List<ChunkMetadata>>> result = new LinkedList<>(); | ||
| // check data type consistent and skip compact files with wrong data type | ||
| for (Pair<TsFileSequenceReader, List<ChunkMetadata>> tsFileSequenceReaderListPair : | ||
| readerAndChunkMetadataList) { | ||
| boolean dataTypeConsistent = true; | ||
| for (ChunkMetadata chunkMetadata : tsFileSequenceReaderListPair.getRight()) { | ||
| if (chunkMetadata != null | ||
| && !MetadataUtils.canAlter(chunkMetadata.getDataType(), correctDataType)) { | ||
| dataTypeConsistent = false; | ||
| if (chunkMetadata == null) { | ||
| continue; | ||
| } | ||
| if (chunkMetadata.getDataType() == correctDataType) { | ||
| break; | ||
| } | ||
| if (chunkMetadata != null && chunkMetadata.getDataType() != correctDataType) { | ||
| chunkMetadata.setNewType(correctDataType); | ||
| if (!MetadataUtils.canAlter(chunkMetadata.getDataType(), correctDataType)) { | ||
| dataTypeConsistent = false; | ||
| break; | ||
| } | ||
| chunkMetadata.setNewType(correctDataType); |
There was a problem hiding this comment.
Pull request overview
This PR addresses incorrect measurement schema selection during compaction (especially when data types change across source files), by consulting the latest schema from the schema fetcher / table cache instead of relying solely on chunk headers from input files.
Changes:
- Update compaction executors/iterators to resolve the “correct” measurement schema (tree model via schema fetcher; table model via
DataNodeTableCache) when data types differ across files. - Add test-only schema fetcher injection and new/updated compaction tests (parameterized across performers) to cover data type mismatch/alter scenarios, including table model.
- Fix a
TsTablecopy-constructor bug that copied the wrong index map.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/schema/table/TsTable.java | Fix copy-constructor to copy tagColumnIndexMap (previously copied the wrong map). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/utils/CompactionFakeSchemaFetcherImpl.java | New fake schema fetcher for compaction tests; exposes schema tree for mutation in tests. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/tablemodel/TableModelCompactionWithTTLTest.java | Use shared getPerformer(String) from AbstractCompactionTest; remove duplicated performer wiring. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchTest.java | Move into alterDataType package, parameterize performers, and populate fake schema tree for “latest schema” resolution. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchAlterableDataTypeTest.java | New parameterized tests for “alterable” type mismatch scenarios; populates fake schema tree. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeAlterTest.java | New parameterized tests validating value correctness after compaction with type alteration (aligned + non-aligned). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeAlterTableTest.java | New parameterized table-model tests validating metadata correctness under type alteration and schema cache usage. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/AbstractCompactionAlterDataTypeTest.java | New shared test base: thread naming, schema fetcher cleanup, helper file generators for different types. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/CompactionDataTypeNotMatchAlterableDataTypeTest.java | Deleted old non-parameterized version (superseded by alterDataType package tests). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/CompactionDataTypeAlterTest.java | Deleted old non-parameterized version (superseded by alterDataType package tests). |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/AbstractCompactionTest.java | Inject compaction fake schema fetcher in setup; centralize performer selection helper; cleanup schema cache in teardown. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/queryengine/plan/analyze/FakeSchemaFetcherImpl.java | Make schema tree field/method protected to allow compaction test subclassing. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/readchunk/ReadChunkAlignedSeriesCompactionExecutor.java | Resolve aligned value schemas via latest tree schema or table cache; add table-model collection path. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/fast/FastNonAlignedSeriesCompactionExecutor.java | Reject files whose datatype cannot be altered to the chosen final type; only rewrite when alterable. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/executor/batch/BatchedReadChunkAlignedSeriesCompactionExecutor.java | Thread database name into aligned read-chunk executor for table schema lookup. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/MultiTsFileDeviceIterator.java | Enhance schema collection for table model and fetch latest schema for tree model when type differs; expose database name. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionUtils.java | Add getLatestMeasurementSchemasForTreeModel(...) and test-only schema fetcher injection; use cache API with explicit flag. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/utils/CompactionSeriesContext.java | Track whether a series needs datatype update and allow overwriting final type. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/performer/impl/ReadChunkCompactionPerformer.java | Pass database name into aligned executor; choose correct datatype using latest schema when mixed types exist. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/compaction/execute/performer/impl/FastCompactionPerformer.java | Simplify non-aligned series setup to rely on updated CompactionSeriesContext for final type resolution. |
Comments suppressed due to low confidence (2)
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchTest.java:59
- This test declares a new
devicefield that shadows thedevicefield inherited fromAbstractCompactionAlterDataTypeTest. Even though they currently resolve to the same path, field shadowing is error-prone and makes it easy to accidentally use different device IDs between helpers (which use the superclass field) and assertions (which use this field). Prefer removing this field and using the inheriteddeviceeverywhere (or rename if a different device is truly intended).
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchTest.java:141 measurementSchemas1is constructed but not used (the aligned INT32 file is generated viagenerateInt32AlignedSeriesFile(...)). Consider removing this unused list to keep the setup clear.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| } | ||
|
|
||
| @Test | ||
| public void testCompactAlignedSeriesWithReadChunkCompactionPerformer() |
| List<IMeasurementSchema> measurementSchemas1 = new ArrayList<>(); | ||
| measurementSchemas1.add(new MeasurementSchema("s1", TSDataType.INT32)); | ||
| measurementSchemas1.add(new MeasurementSchema("s2", TSDataType.INT32)); | ||
|
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17297 +/- ##
============================================
+ Coverage 39.72% 39.78% +0.06%
Complexity 282 282
============================================
Files 5101 5099 -2
Lines 342143 342513 +370
Branches 43583 43660 +77
============================================
+ Hits 135914 136280 +366
- Misses 206229 206233 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| CompactionTsFileReader reader = (CompactionTsFileReader) readerMap.get(resource); | ||
|
|
||
| for (Map.Entry<String, Pair<TimeseriesMetadata, Pair<Long, Long>>> entrySet : |
| schemaFetcher | ||
| .getSchemaTree() | ||
| .appendSingleMeasurementPath( | ||
| new MeasurementPath(device, "s1", new MeasurementSchema("s1", TSDataType.DOUBLE))); |
There was a problem hiding this comment.
Add a case where s1 is still int32?
There was a problem hiding this comment.
Added in org/apache/iotdb/db/storageengine/dataregion/compaction/alterDataType/CompactionDataTypeNotMatchTest.java
| InnerSpaceCompactionTask task = | ||
| new InnerSpaceCompactionTask( | ||
| 0, tsFileManager, seqResources, true, new ReadChunkCompactionPerformer(), 0); | ||
| Assert.assertTrue(task.start()); |
| Assert.assertEquals( | ||
| 1, ((long) tsFileManager.getTsFileList(true).get(0).getStartTime(device).get())); |
|



Description