mongodb runtime_error ai_generated true

MongoServerError: Balancer failed: chunk move failed with error 'ShardNotFound: no config server primary available for shard'

ID: mongodb/sharded-cluster-balancer-stuck

Also available as: JSON · Markdown · 中文
85%Fix Rate
85%Confidence
1Evidence
2024-03-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
mongodb-4.2 active
mongodb-4.4 active
mongodb-5.0 active
mongodb-6.0 active
mongodb-7.0 active

Root Cause

The config server replica set has no primary due to network issues or election failure, preventing the balancer from moving chunks.

generic

中文

配置服务器副本集因网络问题或选举失败没有主节点,阻止了均衡器移动块。

Official Documentation

https://www.mongodb.com/docs/manual/reference/command/balancerStart/

Workarounds

  1. 80% success Check config server replica set status: rs.status(). If no primary, force an election: rs.stepDown() on the current secondary that should be primary. Or restart the config server nodes one by one.
    Check config server replica set status: rs.status(). If no primary, force an election: rs.stepDown() on the current secondary that should be primary. Or restart the config server nodes one by one.
  2. 85% success Temporarily disable the balancer to reduce load: sh.stopBalancer(). Then fix the config server issue, then re-enable: sh.startBalancer().
    Temporarily disable the balancer to reduce load: sh.stopBalancer(). Then fix the config server issue, then re-enable: sh.startBalancer().
  3. 75% success Ensure config server nodes have sufficient resources and network connectivity. Check logs for election issues. Example: grep 'election' /var/log/mongodb/mongod.log
    Ensure config server nodes have sufficient resources and network connectivity. Check logs for election issues. Example: grep 'election' /var/log/mongodb/mongod.log

中文步骤

  1. Check config server replica set status: rs.status(). If no primary, force an election: rs.stepDown() on the current secondary that should be primary. Or restart the config server nodes one by one.
  2. Temporarily disable the balancer to reduce load: sh.stopBalancer(). Then fix the config server issue, then re-enable: sh.startBalancer().
  3. Ensure config server nodes have sufficient resources and network connectivity. Check logs for election issues. Example: grep 'election' /var/log/mongodb/mongod.log

Dead Ends

Common approaches that don't work:

  1. 90% fail

    The error indicates no config server primary; manual moves also require a healthy config server.

  2. 80% fail

    The balancer cannot start if the config server has no primary; this only re-enables the scheduler.

  3. 100% fail

    This causes data loss and does not fix the underlying config server issue.