mongodb runtime_error ai_generated partial

MongoServerError: ChangeStream error: resume token from WiredTiger is stale: token timestamp 1234567890 is older than oldest timestamp 1234567895

ID: mongodb/change-stream-resume-token-wiredtiger-stale

Also available as: JSON · Markdown · 中文
75%Fix Rate
87%Confidence
1Evidence
2024-11-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
mongodb 6.0 active
mongodb 7.0 active
mongodb 8.0 active

Root Cause

The resume token timestamp has been pruned from WiredTiger's history because the storage engine's oldest timestamp advanced past it due to checkpoint or replication pressure.

generic

中文

由于检查点或复制压力导致存储引擎的最旧时间戳超过了恢复令牌时间戳,该令牌已从WiredTiger的历史记录中被修剪。

Official Documentation

https://www.mongodb.com/docs/manual/changeStreams/#resume-tokens

Workarounds

  1. 85% success Increase the oplog size to retain more history: `db.adminCommand({ replSetResizeOplog: 1, size: 40960 })` (40GB) and use `db.adminCommand({ setParameter: 1, minSnapshotHistoryWindowInSeconds: 3600 })` to keep WiredTiger snapshots for 1 hour.
    Increase the oplog size to retain more history: `db.adminCommand({ replSetResizeOplog: 1, size: 40960 })` (40GB) and use `db.adminCommand({ setParameter: 1, minSnapshotHistoryWindowInSeconds: 3600 })` to keep WiredTiger snapshots for 1 hour.
  2. 90% success Implement fallback logic in the application: on this error, start a new change stream from the current time using `db.collection.watch([], { startAfter: resumeToken })` or `{ startAtOperationTime: Timestamp(...) }` and reprocess recent events.
    Implement fallback logic in the application: on this error, start a new change stream from the current time using `db.collection.watch([], { startAfter: resumeToken })` or `{ startAtOperationTime: Timestamp(...) }` and reprocess recent events.
  3. 70% success Reduce the resume token interval by polling change streams more frequently (e.g., every 5 seconds instead of 30 seconds) to keep tokens within the retention window.
    Reduce the resume token interval by polling change streams more frequently (e.g., every 5 seconds instead of 30 seconds) to keep tokens within the retention window.

中文步骤

  1. Increase the oplog size to retain more history: `db.adminCommand({ replSetResizeOplog: 1, size: 40960 })` (40GB) and use `db.adminCommand({ setParameter: 1, minSnapshotHistoryWindowInSeconds: 3600 })` to keep WiredTiger snapshots for 1 hour.
  2. Implement fallback logic in the application: on this error, start a new change stream from the current time using `db.collection.watch([], { startAfter: resumeToken })` or `{ startAtOperationTime: Timestamp(...) }` and reprocess recent events.
  3. Reduce the resume token interval by polling change streams more frequently (e.g., every 5 seconds instead of 30 seconds) to keep tokens within the retention window.

Dead Ends

Common approaches that don't work:

  1. 95% fail

    The token is already pruned from the oplog and WiredTiger; retrying with the same token will result in the same error.

  2. 80% fail

    Checkpoint frequency does not directly control timestamp pruning; the oldest timestamp is managed by replication and read concern settings.

  3. 90% fail

    This is not configurable and would break point-in-time read operations; the error is a symptom of a broader retention issue.