mongodb system_error ai_generated partial

WiredTiger error: checkpoint stall detected: unable to create checkpoint within 60 seconds

ID: mongodb/wiredtiger-checkpoint-stall

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2023-08-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
MongoDB 6.0 active
MongoDB 7.0 active
MongoDB 8.0 active

Root Cause

High write load or disk I/O contention prevents WiredTiger from completing a checkpoint within the configured timeout.

generic

中文

高写入负载或磁盘 I/O 争用阻止 WiredTiger 在配置的超时时间内完成检查点。

Workarounds

  1. 75% success Monitor disk I/O with iostat or MongoDB's serverStatus().wiredTiger.concurrentTransactions and upgrade to faster storage (e.g., NVMe SSDs) if needed.
    Monitor disk I/O with iostat or MongoDB's serverStatus().wiredTiger.concurrentTransactions and upgrade to faster storage (e.g., NVMe SSDs) if needed.
  2. 70% success Reduce write load by throttling application writes or using write concerns with lower durability (e.g., w:1 instead of w:majority).
    Reduce write load by throttling application writes or using write concerns with lower durability (e.g., w:1 instead of w:majority).
  3. 80% success Increase checkpoint timeout via storage.wiredTiger.engineConfig.checkpointWaitTimeoutSecs in the configuration file.
    Increase checkpoint timeout via storage.wiredTiger.engineConfig.checkpointWaitTimeoutSecs in the configuration file.

中文步骤

  1. Monitor disk I/O with iostat or MongoDB's serverStatus().wiredTiger.concurrentTransactions and upgrade to faster storage (e.g., NVMe SSDs) if needed.
  2. Reduce write load by throttling application writes or using write concerns with lower durability (e.g., w:1 instead of w:majority).
  3. Increase checkpoint timeout via storage.wiredTiger.engineConfig.checkpointWaitTimeoutSecs in the configuration file.

Dead Ends

Common approaches that don't work:

  1. 70% fail

    Smaller cache increases I/O pressure, potentially stalling checkpoints more.

  2. 80% fail

    Checkpoints are still needed for durability; disabling journaling doesn't eliminate checkpoint requirement.

  3. 60% fail

    Longer intervals may accumulate more dirty data, making checkpoints even slower.