elasticsearch data_error ai_generated true

TranslogCorruptedException: translog corruption detected at position 12345

ID: elasticsearch/translog-corruption-on-flush

Also available as: JSON · Markdown · 中文
85%Fix Rate
88%Confidence
1Evidence
2024-06-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
7.17.15 active
8.7.0 active
8.13.2 active

Root Cause

The translog file is corrupted due to a sudden node crash, disk I/O error, or file system inconsistency during a flush operation.

generic

中文

事务日志文件因节点突然崩溃、磁盘 I/O 错误或刷新操作期间的文件系统不一致而损坏。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/troubleshooting.html

Workarounds

  1. 88% success Use the Elasticsearch 'elasticsearch-shard' CLI tool to truncate the translog: ./bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0. This removes only the corrupted part and recovers the shard.
    Use the Elasticsearch 'elasticsearch-shard' CLI tool to truncate the translog: ./bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0. This removes only the corrupted part and recovers the shard.
  2. 80% success If the shard is replica, allocate a new replica from the primary: POST /_cluster/reroute { "commands": [{ "allocate_replica": { "index": "my_index", "shard": 0, "node": "my_node" } }] } and then delete the corrupted shard.
    If the shard is replica, allocate a new replica from the primary: POST /_cluster/reroute { "commands": [{ "allocate_replica": { "index": "my_index", "shard": 0, "node": "my_node" } }] } and then delete the corrupted shard.

中文步骤

  1. 使用 Elasticsearch 'elasticsearch-shard' CLI 工具截断事务日志:./bin/elasticsearch-shard remove-corrupted-data --index my_index --shard 0。仅移除损坏部分并恢复分片。
  2. 如果分片是副本,从主分片分配新副本:POST /_cluster/reroute { "commands": [{ "allocate_replica": { "index": "my_index", "shard": 0, "node": "my_node" } }] },然后删除损坏的分片。

Dead Ends

Common approaches that don't work:

  1. 95% fail

    Deleting the translog directly causes data loss and may leave the index in an inconsistent state that cannot be recovered.

  2. 90% fail

    A corrupted translog cannot be replayed; Elasticsearch will fail to open the shard and the error persists.