mongodb
runtime_error
ai_generated
partial
MongoServerError: balancer round trip time exceeded 30 seconds: chunk migration timed out for range
ID: mongodb/balancer-round-trip-timeout
78%Fix Rate
85%Confidence
1Evidence
2024-05-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| mongodb 6.0 | active | — | — | — |
| mongodb 7.0 | active | — | — | — |
| mongodb 8.0 | active | — | — | — |
Root Cause
Chunk migration between shards failed due to network latency or overloaded source shard, exceeding the 30-second round-trip threshold.
generic中文
由于网络延迟或源分片过载,分片间的块迁移失败,超过了30秒的往返时间阈值。
Official Documentation
https://www.mongodb.com/docs/manual/reference/command/moveChunk/Workarounds
-
75% success Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.
Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.
-
82% success Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity.
Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity. -
85% success Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.
Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.
中文步骤
Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.
Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity.Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.
Dead Ends
Common approaches that don't work:
-
90% fail
The timeout is caused by underlying network or resource issues, not stale balancer state; restarting only delays recovery and may lose migration progress.
-
95% fail
The round-trip timeout is a hard-coded balancer limit, not a client-side query timeout; adjusting maxTimeMS does not affect internal balancer logic.
-
80% fail
Stopping all chunk migrations leads to data imbalance, reduced query performance, and eventual cluster instability.