ES_PERSISTENT_TASK_ASSIGN_FAIL elasticsearch runtime_error ai_generated partial

PersistentTaskException: task [cluster:admin/persistent/assignment] failed to assign task [task_id_123] to node [node-1] after [5] attempts

ID: elasticsearch/persistent-task-assignment-failure

Also available as: JSON · Markdown · 中文

82%Fix Rate

85%Confidence

1Evidence

2024-06-15First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
7.17.0	active	—	—	—
8.11.0	active	—	—	—
8.12.0	active	—	—	—

Root Cause

A persistent task (e.g., ILM, Rollup, Watcher) cannot be assigned to any available node because of node attribute mismatches, resource constraints, or cluster topology changes during rolling restart.

generic

中文

持久化任务（例如ILM、Rollup、Watcher）由于节点属性不匹配、资源限制或滚动重启期间集群拓扑变化而无法分配给任何可用节点。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html

Workarounds

85% success Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
```
Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
```
75% success Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
```
Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
```
80% success Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.
```
Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.
```

中文步骤

Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.

Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.

Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.

Dead Ends

Common approaches that don't work:

85% fail
Forceful restart causes more assignment failures as tasks lose their target nodes and can't reassign mid-restart.
75% fail
Retries don't fix the underlying node attribute or resource issue; they only delay the eventual failure.
90% fail
This removes the task but loses its progress, and the task may be recreated by the system (e.g., ILM) causing the same error again.