elasticsearch
system_error
ai_generated
partial
TranslogCorruptedException: translog corruption detected at position 67890 while recovering index [my_index] shard [0]
ID: elasticsearch/translog-corruption-during-recovery
70%Fix Rate
86%Confidence
1Evidence
2023-09-05First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| Elasticsearch 7.16.0 | active | — | — | — |
| Elasticsearch 8.8.0 | active | — | — | — |
| Elasticsearch 8.15.0 | active | — | — | — |
Root Cause
The transaction log file for a shard is corrupted, often due to abrupt node shutdown, disk errors, or filesystem issues, preventing shard recovery.
generic中文
分片的事务日志文件损坏,通常由于节点突然关闭、磁盘错误或文件系统问题导致,阻止分片恢复。
Official Documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/translog.html#translog-corruptionWorkarounds
-
80% success Use the Elasticsearch CLI tool `elasticsearch-shard` to truncate the translog. Run: `bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0`. This removes corrupted translog entries and allows the shard to recover with potential data loss of recent operations.
Use the Elasticsearch CLI tool `elasticsearch-shard` to truncate the translog. Run: `bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0`. This removes corrupted translog entries and allows the shard to recover with potential data loss of recent operations.
-
72% success Restore the shard from a snapshot. If a snapshot exists, delete the corrupt index and restore: `POST /_snapshot/my_repo/my_snapshot/_restore {"indices": "my_index", "rename_pattern": "my_index", "rename_replacement": "my_index"}`. Ensure the snapshot is recent enough.
Restore the shard from a snapshot. If a snapshot exists, delete the corrupt index and restore: `POST /_snapshot/my_repo/my_snapshot/_restore {"indices": "my_index", "rename_pattern": "my_index", "rename_replacement": "my_index"}`. Ensure the snapshot is recent enough.
中文步骤
Use the Elasticsearch CLI tool `elasticsearch-shard` to truncate the translog. Run: `bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0`. This removes corrupted translog entries and allows the shard to recover with potential data loss of recent operations.
Restore the shard from a snapshot. If a snapshot exists, delete the corrupt index and restore: `POST /_snapshot/my_repo/my_snapshot/_restore {"indices": "my_index", "rename_pattern": "my_index", "rename_replacement": "my_index"}`. Ensure the snapshot is recent enough.
Dead Ends
Common approaches that don't work:
-
85% fail
This may cause data loss and prevent the shard from recovering at all because Elasticsearch expects a valid translog; the shard may become permanently unassigned.
-
60% fail
If the corrupt shard is the primary, the cluster cannot allocate it, and reindexing from a snapshot may not include recent data not in the snapshot.