[server] Integrate remote directory selector into table/partition creation#2769
[server] Integrate remote directory selector into table/partition creation#2769LiebingYu wants to merge 1 commit intoapache:mainfrom
Conversation
a510c15 to
c911eb4
Compare
...-server/src/main/java/org/apache/fluss/server/coordinator/remote/RemoteDirDynamicLoader.java
Outdated
Show resolved
Hide resolved
| public void createPartition( | ||
| TablePath tablePath, | ||
| long tableId, | ||
| String remoteDataDir, |
There was a problem hiding this comment.
I am thinking about possible improvements here:
1.
Will it be better to put the remote dir selection logic inside into the MetadataManager.createPartition() and MetadataManager.createTable().
I notice that now there exist some code pattern that (a). select a remoteDataDir value. (b) pass the value into createTable/createPartition functions, when we need to createTable/partition otherwhere.
Put the step(a) inside those functions will make the code more focused and void adding a remoteDataDir parameter in these functions.
Or
2.
Is is good to add the 'remoteDataDir' field into the TablePath class? Make the TablePath object carrys the remote dir info, and pass it through to everywhere. Then we don't need to care about adding an independent remoteDataDir parameter in functions which already has the TablePath parameter.
The implementation now is good to work. Just put my thoughts here. What do you think?
There was a problem hiding this comment.
-
Put the selection into
MetadataManageralso need to change a lot. By comparison, I think the current approach would actually introduce smaller changes. -
TablePathrepresents the logical path of a Fluss table, whileremoteDataDiris the physical storage path in the implementation. I don’t think it’s appropriate to putremoteDataDirintoTablePath.
c911eb4 to
a398846
Compare
0818c44 to
860ffd3
Compare
There was a problem hiding this comment.
Pull request overview
This PR integrates a pluggable, dynamically reconfigurable remote data directory selection mechanism into table and partition creation so each new table/partition is assigned a deterministic remoteDataDir at creation time (issue #2755).
Changes:
- Added
RemoteDirSelectorwith round-robin and smooth weighted round-robin implementations, plus aRemoteDirDynamicLoaderfor runtime config reloading. - Wired remote dir selection into Coordinator table/partition creation paths and into auto-partition creation, and extended metadata APIs to persist the chosen
remoteDataDir. - Added/updated unit and integration tests, and allowed
remote.data.dirs*configs to be changed dynamically.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/remote/RemoteDirSelector.java | New selector interface used by coordinator flows. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/remote/RoundRobinRemoteDirSelector.java | New round-robin selector implementation. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/remote/WeightedRoundRobinRemoteDirSelector.java | New smooth weighted round-robin selector implementation. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/remote/RemoteDirDynamicLoader.java | New dynamic loader implementing ServerReconfigurable for hot config reloads. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorServer.java | Instantiates/registers RemoteDirDynamicLoader; injects it into services/managers. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java | Selects a remote dir during createTable/createPartition; stores it in metadata. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/MetadataManager.java | Adds remoteDataDir parameter to create APIs; persists it in ZK registrations. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/AutoPartitionManager.java | Selects a remote dir for each auto-created partition. |
| fluss-server/src/main/java/org/apache/fluss/server/DynamicServerConfig.java | Allows remote.data.dirs, strategy, and weights to be dynamically altered. |
| fluss-common/src/main/java/org/apache/fluss/config/FlussConfigUtils.java | Extracts remote dir validation into validateRemoteDataDirs() and adds weight-sum validation. |
| fluss-common/src/test/java/org/apache/fluss/config/FlussConfigUtilsTest.java | Adds test coverage for invalid “all zero weights” configuration. |
| fluss-server/src/test/java/org/apache/fluss/server/testutils/FlussClusterExtension.java | Test cluster support for configuring multiple remote dirs. |
| fluss-server/src/test/java/org/apache/fluss/server/metadata/ZkBasedMetadataProviderTest.java | Updates table creation calls to pass remoteDataDir. |
| fluss-server/src/test/java/org/apache/fluss/server/DynamicConfigChangeTest.java | Adds test for dynamic reconfiguration affecting remote dir selector type. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/statemachine/TableBucketStateMachineTest.java | Updates context wiring to include RemoteDirDynamicLoader. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/rebalance/RebalanceManagerTest.java | Updates test wiring for new AutoPartitionManager constructor and config usage. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/event/watcher/TableChangeWatcherTest.java | Updates table creation calls to pass remoteDataDir. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessorTest.java | Updates wiring and table creation calls to pass remoteDataDir. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/AutoPartitionManagerTest.java | Adds remote dir loader/config to verify partitions’ remote dir assignments. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/remote/RoundRobinRemoteDirSelectorTest.java | New unit tests for round-robin selector behavior. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/remote/WeightedRoundRobinRemoteDirSelectorTest.java | New unit tests for weighted selector distribution properties. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/remote/RemoteDirDynamicLoaderTest.java | New unit tests for loader reconfiguration/validation behavior. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/remote/RemoteDirsITCase.java | New IT coverage verifying multi-remote-dir behavior across tables/partitions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...-server/src/main/java/org/apache/fluss/server/coordinator/remote/RemoteDirDynamicLoader.java
Show resolved
Hide resolved
...-server/src/main/java/org/apache/fluss/server/coordinator/remote/RemoteDirDynamicLoader.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java
Show resolved
Hide resolved
...ain/java/org/apache/fluss/server/coordinator/remote/WeightedRoundRobinRemoteDirSelector.java
Show resolved
Hide resolved
...er/src/main/java/org/apache/fluss/server/coordinator/remote/RoundRobinRemoteDirSelector.java
Show resolved
Hide resolved
| racks != null && racks.length == numOfTabletServers, | ||
| "racks must be not null and have the same length as numOfTabletServers"); | ||
| this.racks = racks; | ||
| this.remoteDirNames = remoteDirNames; |
There was a problem hiding this comment.
FlussClusterExtension assumes remoteDirNames is non-null (setRemoteDataDirs calls remoteDirNames.isEmpty()), but Builder.setRemoteDirNames does not null-check and the constructor doesn’t enforce non-null either. Passing null will cause a NullPointerException during cluster start. Consider defaulting null to Collections.emptyList() and/or validating the argument in setRemoteDirNames()/constructor.
| this.remoteDirNames = remoteDirNames; | |
| this.remoteDirNames = | |
| remoteDirNames != null ? remoteDirNames : Collections.emptyList(); |
| Optional<TableRegistration> tableOpt = zkClient.getTable(tablePath); | ||
| assertThat(tableOpt).isPresent(); | ||
| TableRegistration table = tableOpt.get(); | ||
| // Partitioned table should NOT have remoteDataDir set at table level |
There was a problem hiding this comment.
The comment says the partitioned table should NOT have remoteDataDir set at table level, but the assertion immediately below expects it to be non-null. Given table creation now always stores a selected remoteDataDir (used for metadata), the comment should be updated to match the actual expected behavior.
| // Partitioned table should NOT have remoteDataDir set at table level | |
| // Partitioned table should have remoteDataDir set at table level (used for metadata) |
860ffd3 to
9b8855a
Compare
Purpose
Linked issue: close #2755
Brief change log
Tests
API and Format
Documentation