mongodb runtime_error ai_generated partial

MongoServerError: balancer round trip time exceeded 30 seconds: chunk migration timed out for range

ID: mongodb/balancer-round-trip-timeout

Also available as: JSON · Markdown · 中文

78%Fix Rate

85%Confidence

1Evidence

2024-05-12First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
mongodb 6.0	active	—	—	—
mongodb 7.0	active	—	—	—
mongodb 8.0	active	—	—	—

Root Cause

Chunk migration between shards failed due to network latency or overloaded source shard, exceeding the 30-second round-trip threshold.

generic

中文

由于网络延迟或源分片过载，分片间的块迁移失败，超过了30秒的往返时间阈值。

Official Documentation

https://www.mongodb.com/docs/manual/reference/command/moveChunk/

Workarounds

75% success Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.
```
Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.
```
82% success Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity.
```
Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity.
```
85% success Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.
```
Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.
```

中文步骤

Temporarily disable the balancer to allow pending migrations to complete: `sh.stopBalancer(10000)` then re-enable after monitoring network latency between shards.

Increase chunk size to reduce migration frequency: `db.adminCommand({ setClusterParameter: { chunkSize: 128 } })` and ensure source shard has sufficient I/O capacity.

Use `db.adminCommand({ moveChunk: 'mydb.mycoll', find: { shardKey: value }, to: 'targetShard', maxTimeMS: 60000 })` to manually migrate chunks with a longer timeout after verifying network health.

Dead Ends

Common approaches that don't work:

90% fail
The timeout is caused by underlying network or resource issues, not stale balancer state; restarting only delays recovery and may lose migration progress.
95% fail
The round-trip timeout is a hard-coded balancer limit, not a client-side query timeout; adjusting maxTimeMS does not affect internal balancer logic.
80% fail
Stopping all chunk migrations leads to data imbalance, reduced query performance, and eventual cluster instability.