mongodb resource_error ai_generated true

WiredTiger error (0) - cache stuck: eviction worker thread stalled for 120 seconds

ID: mongodb/wiredtiger-cache-eviction-stall

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2023-09-01First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
mongodb-4.2 active
mongodb-4.4 active
mongodb-5.0 active
mongodb-6.0 active
mongodb-7.0 active

Root Cause

WiredTiger cache is full and eviction cannot keep up, often due to high write load or insufficient cache size.

generic

中文

WiredTiger 缓存已满且驱逐无法跟上,通常由于高写入负载或缓存大小不足。

Official Documentation

https://www.mongodb.com/docs/manual/reference/log-messages/#std-label-log-message-wt-eviction-stall

Workarounds

  1. 70% success Increase WiredTiger cache size in mongod.conf: storage.wiredTiger.engineConfig.cacheSizeGB: 4 (adjust based on available RAM). Restart mongod.
    Increase WiredTiger cache size in mongod.conf: storage.wiredTiger.engineConfig.cacheSizeGB: 4 (adjust based on available RAM). Restart mongod.
  2. 80% success Reduce write load by batching writes or using a write queue. Monitor with db.serverStatus().wiredTiger.cache.
    Reduce write load by batching writes or using a write queue. Monitor with db.serverStatus().wiredTiger.cache.
  3. 75% success Add more secondary nodes to distribute read load, reducing cache pressure on primary. Example: rs.add('newSecondary:27017')
    Add more secondary nodes to distribute read load, reducing cache pressure on primary. Example: rs.add('newSecondary:27017')

中文步骤

  1. Increase WiredTiger cache size in mongod.conf: storage.wiredTiger.engineConfig.cacheSizeGB: 4 (adjust based on available RAM). Restart mongod.
  2. Reduce write load by batching writes or using a write queue. Monitor with db.serverStatus().wiredTiger.cache.
  3. Add more secondary nodes to distribute read load, reducing cache pressure on primary. Example: rs.add('newSecondary:27017')

Dead Ends

Common approaches that don't work:

  1. 80% fail

    Smaller cache makes eviction pressure worse; the error is caused by insufficient cache, not excess.

  2. 90% fail

    This increases risk of data loss and does not address the eviction bottleneck.

  3. 50% fail

    If memory is constrained, this can cause OOM kills; the root cause is often eviction efficiency, not cache size.