elasticsearch
system_error
ai_generated
true
ShardLockObtainFailedException: [my_index][0] obtaining shard lock failed
ID: elasticsearch/primary-shard-not-allocated-due-to-shard-lock
82%Fix Rate
85%Confidence
1Evidence
2024-03-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| 7.17.10 | active | — | — | — |
| 8.6.2 | active | — | — | — |
| 8.11.0 | active | — | — | — |
Root Cause
A shard lock cannot be acquired because the shard is still being recovered or a previous node crash left a stale lock file on disk.
generic中文
分片锁无法获取,因为分片仍在恢复中,或者之前的节点崩溃在磁盘上留下了过期的锁文件。
Official Documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-recovery.htmlWorkarounds
-
90% success Remove the stale shard lock file manually: find the shard directory under ES_PATH_CONF/data/nodes/0/indices/<index-uuid>/0/ and delete the 'index.lock' or 'shard.lock' file, then restart the node.
Remove the stale shard lock file manually: find the shard directory under ES_PATH_CONF/data/nodes/0/indices/<index-uuid>/0/ and delete the 'index.lock' or 'shard.lock' file, then restart the node.
-
75% success Reroute the unassigned shard using the Cluster Reroute API: POST /_cluster/reroute { "commands": [{ "allocate_stale_primary": { "index": "my_index", "shard": 0, "node": "my_node", "accept_data_loss": true } }] }
Reroute the unassigned shard using the Cluster Reroute API: POST /_cluster/reroute { "commands": [{ "allocate_stale_primary": { "index": "my_index", "shard": 0, "node": "my_node", "accept_data_loss": true } }] }
中文步骤
手动删除过期的分片锁文件:找到 ES_PATH_CONF/data/nodes/0/indices/<索引UUID>/0/ 目录下的 'index.lock' 或 'shard.lock' 文件,删除后重启节点。
使用 Cluster Reroute API 重新分配未分配的分片:POST /_cluster/reroute { "commands": [{ "allocate_stale_primary": { "index": "my_index", "shard": 0, "node": "my_node", "accept_data_loss": true } }] }
Dead Ends
Common approaches that don't work:
-
70% fail
Deleting the index loses all data and may not resolve the underlying lock issue if the file system is corrupted.
-
85% fail
Restarting does not remove stale lock files; the lock file persists and the same error occurs again.