Skip to content

[server] Support limiting rebalance buckets per round#3498

Open
fhan688 wants to merge 2 commits into
apache:mainfrom
fhan688:support-rebalance-partitions-limitation
Open

[server] Support limiting rebalance buckets per round#3498
fhan688 wants to merge 2 commits into
apache:mainfrom
fhan688:support-rebalance-partitions-limitation

Conversation

@fhan688

@fhan688 fhan688 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: #3348

Large rebalance plans may activate too many bucket movement tasks at once, which can increase pressure on the CoordinatorServer and TabletServers in large clusters.

This PR adds a coordinator configuration to limit how many bucket rebalance tasks are activated in one round. When the limit is enabled and a generated rebalance plan exceeds the limit, Fluss
splits the plan into recoverable rounds, persists full progress in ZooKeeper, and activates the next round only after all bucket tasks in the current round reach a final state.

The default behavior remains unchanged.

Brief change log

  • Add coordinator.rebalance.max-buckets-per-round with default value 0.

    • 0 disables round limiting and keeps the existing behavior.
    • A positive value enables round-based rebalance execution when the generated plan is larger than the configured limit.
  • Move rebalance generation and registration into RebalanceManager.

    • Keep the legacy single-task path when round limiting is disabled or unnecessary.
    • Split large plans deterministically by table_id, partition_id, and bucket_id.
    • Persist round execution metadata and per-round bucket progress in ZooKeeper.
    • Continue writing the currently active round to the legacy rebalance task znode for compatibility with the existing execution flow.
  • Add round-based rebalance recovery and lifecycle handling.

    • Restore the first unfinished round after CoordinatorServer restart.
    • Aggregate progress from all persisted rounds for listRebalanceProgress.
    • Mark unfinished round tasks as CANCELED when canceling a round-based rebalance.
    • Advance to the next round after the current round reaches final bucket states.
  • Add ZooKeeper metadata models and JSON serde for:

    • RebalanceExecution
    • RebalanceRound
  • Add tests for:

    • default legacy behavior
    • small plans that do not need splitting
    • multi-round completion
    • partitioned table buckets
    • recovery from ZooKeeper
    • canceling round-based rebalance
    • JSON serde compatibility for new metadata

Tests

  • git diff --check
  • mvn -pl fluss-server -am -DskipITs -Dcheckstyle.skip=true -DfailIfNoTests=false -Dtest=RebalanceManagerTest,RebalanceExecutionJsonSerdeTest,RebalanceRoundJsonSerdeTest test

API and Format

  • Adds a new coordinator configuration:

    • coordinator.rebalance.max-buckets-per-round
  • No public client API or RPC protocol changes.

  • Adds new ZooKeeper metadata under the existing rebalance znode:

    • /cluster/rebalance/execution
    • /cluster/rebalance/rounds/[roundIndex]
  • Backward compatibility:

    • The default value 0 preserves existing behavior.
    • Existing single rebalance task metadata remains the active execution format when round limiting is disabled or the plan size does not exceed the configured limit.
    • In round-based mode, the active round is still registered as the legacy rebalance task.

Documentation

  • Add the new configuration to website/docs/maintenance/configuration.md.
  • Add operational guidance for limiting active rebalance bucket tasks to website/docs/maintenance/operations/rebalance.md.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional “round-based” rebalance execution mode in the CoordinatorServer to limit how many bucket movement tasks are activated per round, reducing pressure on large clusters while preserving legacy behavior by default (0 = disabled).

Changes:

  • Added coordinator.rebalance.max-buckets-per-round and documented it in ops/config docs.
  • Implemented round-based rebalance splitting, persistence (ZooKeeper), recovery, and cancellation behavior in RebalanceManager.
  • Added ZooKeeper metadata models + JSON serde (RebalanceExecution, RebalanceRound) and corresponding unit tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
website/docs/maintenance/operations/rebalance.md Documents how to limit active bucket tasks via the new config.
website/docs/maintenance/configuration.md Adds the new coordinator configuration option to the reference table.
fluss-server/src/test/java/org/apache/fluss/server/zk/data/RebalanceRoundJsonSerdeTest.java Adds JSON serde tests for RebalanceRound.
fluss-server/src/test/java/org/apache/fluss/server/zk/data/RebalanceExecutionJsonSerdeTest.java Adds JSON serde tests for RebalanceExecution.
fluss-server/src/test/java/org/apache/fluss/server/coordinator/rebalance/RebalanceManagerTest.java Adds tests for legacy vs round-based execution, completion, recovery, cancellation, partitioned buckets.
fluss-server/src/main/java/org/apache/fluss/server/zk/ZooKeeperClient.java Adds ZooKeeper APIs for rebalance execution/round metadata and recursive deletion under /cluster/rebalance.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/ZkData.java Defines new znodes for /cluster/rebalance/execution and /cluster/rebalance/rounds/*.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/RebalanceRoundJsonSerde.java Implements JSON serde for per-round persisted progress.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/RebalanceRound.java Adds persistent model for a rebalance round (plan + per-bucket status).
fluss-server/src/main/java/org/apache/fluss/server/zk/data/RebalanceExecutionJsonSerde.java Implements JSON serde for execution-level metadata.
fluss-server/src/main/java/org/apache/fluss/server/zk/data/RebalanceExecution.java Adds persistent model for round-based rebalance execution metadata.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/rebalance/RebalanceManager.java Core implementation: splitting, persistence, recovery, progress aggregation, and round transitions.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorEventProcessor.java Switches rebalance flow to generateAndRegisterRebalance(...) and wires config into RebalanceManager.
fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java Adds the new config option definition and description.
Comments suppressed due to low confidence (1)

fluss-server/src/main/java/org/apache/fluss/server/coordinator/rebalance/RebalanceManager.java:226

  • The log message concatenation misses a space, producing "no" + "rebalance" → "norebalance" in the emitted message. Consider using a single string literal to avoid this kind of formatting issue.
            LOG.error(
                    "Failed to get rebalance plan from zookeeper, it will be treated as no"
                            + "rebalance tasks.",
                    e);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants