ES_PERSISTENT_TASK_ASSIGN_FAIL elasticsearch runtime_error ai_generated partial

持久化任务异常:任务 [cluster:admin/persistent/assignment] 在 [5] 次尝试后未能将任务 [task_id_123] 分配给节点 [node-1]

PersistentTaskException: task [cluster:admin/persistent/assignment] failed to assign task [task_id_123] to node [node-1] after [5] attempts

ID: elasticsearch/persistent-task-assignment-failure

其他格式: JSON · Markdown 中文 · English
82%修复率
85%置信度
1证据数
2024-06-15首次发现

版本兼容性

版本状态引入弃用备注
7.17.0 active
8.11.0 active
8.12.0 active

根因分析

持久化任务(例如ILM、Rollup、Watcher)由于节点属性不匹配、资源限制或滚动重启期间集群拓扑变化而无法分配给任何可用节点。

English

A persistent task (e.g., ILM, Rollup, Watcher) cannot be assigned to any available node because of node attribute mismatches, resource constraints, or cluster topology changes during rolling restart.

generic

官方文档

https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html

解决方案

  1. Ensure all nodes have the required attributes set in `elasticsearch.yml` (e.g., `node.attr.rack: r1`) and restart nodes one by one, waiting for shard recovery after each restart.
  2. Use the `_tasks` API to reassign the task manually: `POST _tasks/task_id_123/_cancel` then `POST _tasks/task_id_123/_retry`.
  3. Check node resource availability (CPU, memory) and scale up or add more nodes to the cluster to free up capacity.

无效尝试

常见但无效的做法:

  1. 85% 失败

    Forceful restart causes more assignment failures as tasks lose their target nodes and can't reassign mid-restart.

  2. 75% 失败

    Retries don't fix the underlying node attribute or resource issue; they only delay the eventual failure.

  3. 90% 失败

    This removes the task but loses its progress, and the task may be recreated by the system (e.g., ILM) causing the same error again.