mongodb runtime_error ai_generated true

MongoServerError: Transaction aborted due to shard chunk migration in progress

ID: mongodb/transaction-abort-on-shard-migration

Also available as: JSON · Markdown · 中文
85%Fix Rate
84%Confidence
1Evidence
2023-11-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
MongoDB 5.0 active
MongoDB 6.0 active
MongoDB 7.0 active

Root Cause

A multi-document transaction accessed a chunk that was being migrated between shards, causing the balancer to abort the transaction.

generic

中文

多文档事务访问了一个正在分片之间迁移的块,导致平衡器中止了该事务。

Official Documentation

https://www.mongodb.com/docs/manual/core/sharded-cluster-balancer/

Workarounds

  1. 90% success Implement a retry loop with exponential backoff to re-run the transaction after a delay: while (true) { try { session.startTransaction(); ... session.commitTransaction(); break; } catch (e) { if (e.code === 251) { sleep(1000 * Math.pow(2, retries)); retries++; } else throw e; } }
    Implement a retry loop with exponential backoff to re-run the transaction after a delay: while (true) { try { session.startTransaction(); ... session.commitTransaction(); break; } catch (e) { if (e.code === 251) { sleep(1000 * Math.pow(2, retries)); retries++; } else throw e; } }
  2. 85% success Temporarily disable the balancer during critical transaction windows: sh.stopBalancer(); // run transactions; sh.startBalancer()
    Temporarily disable the balancer during critical transaction windows: sh.stopBalancer(); // run transactions; sh.startBalancer()
  3. 75% success Avoid accessing chunks that are being migrated by querying the config.changelog collection to check for recent migrations and routing transactions to stable chunks: use config.changelog to identify migration patterns
    Avoid accessing chunks that are being migrated by querying the config.changelog collection to check for recent migrations and routing transactions to stable chunks: use config.changelog to identify migration patterns

中文步骤

  1. Implement a retry loop with exponential backoff to re-run the transaction after a delay: while (true) { try { session.startTransaction(); ... session.commitTransaction(); break; } catch (e) { if (e.code === 251) { sleep(1000 * Math.pow(2, retries)); retries++; } else throw e; } }
  2. Temporarily disable the balancer during critical transaction windows: sh.stopBalancer(); // run transactions; sh.startBalancer()
  3. Avoid accessing chunks that are being migrated by querying the config.changelog collection to check for recent migrations and routing transactions to stable chunks: use config.changelog to identify migration patterns

Dead Ends

Common approaches that don't work:

  1. 50% fail

    Stopping the balancer is not a fix for the transaction; it only avoids the symptom but can cause performance issues.

  2. 90% fail

    The abort is triggered by the migration process, not by timeout.

  3. 70% fail

    The migration is asynchronous and may take seconds to minutes to complete.