# 事务日志损坏异常：在恢复分片 [my_index][0] 时，在位置 12345 检测到事务日志损坏

- **ID:** `elasticsearch/translog-corruption-on-recovery`
- **领域:** elasticsearch
- **类别:** data_error
- **验证级别:** ai_generated
- **修复率:** 82%

## 根因

由于磁盘错误、突然关闭或文件系统不一致，分片的事务日志文件损坏，导致分片恢复失败。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| 7.10.0 | active | — | — |
| 7.17.0 | active | — | — |
| 8.0.0 | active | — | — |

## 解决方案

1. ```
   Truncate the corrupted translog by removing the translog directory for the affected shard (e.g., `rm -rf /var/lib/elasticsearch/nodes/0/indices/<index-uuid>/0/translog`) and then force a recovery from the primary or replica using `POST _cluster/reroute?retry_failed=true`.
   ```
2. ```
   If the shard is a replica, delete it and reallocate: `POST _cluster/reroute { "commands": [ { "cancel": { "index": "my_index", "shard": 0, "node": "node-1" } } ] }` then let Elasticsearch recreate it from the primary.
   ```

## 无效尝试

- **** — Filesystem repair may not fix translog corruption at the application level; Elasticsearch may still see invalid data. (90% 失败率)
- **** — The corruption persists in the translog file; restarting will trigger the same error during recovery. (95% 失败率)
