TASK_CANCELLED elasticsearch runtime_error ai_generated true

TaskCancellationException: task [id:12345] cancelled with reason [timeout] while waiting for completion

ID: elasticsearch/task-cancellation-exception

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2024-06-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Elasticsearch 7.17 active
Elasticsearch 8.5 active
Elasticsearch 8.10 active

Root Cause

A long-running task (e.g., reindex, snapshot) exceeded the configured timeout and was forcibly cancelled by the cluster.

generic

中文

长时间运行的任务(如重新索引、快照)超过了配置的超时时间,被集群强制取消。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html

Workarounds

  1. 80% success Increase the timeout for the specific task via API, e.g., `POST _tasks/cancel?actions=cluster:admin/reindex&timeout=2h`
    Increase the timeout for the specific task via API, e.g., `POST _tasks/cancel?actions=cluster:admin/reindex&timeout=2h`
  2. 85% success Optimize the task by reducing batch size: for reindex, set `{"source": {"size": 500}, "dest": {"index": "new_index"}}`
    Optimize the task by reducing batch size: for reindex, set `{"source": {"size": 500}, "dest": {"index": "new_index"}}`
  3. 75% success Increase `search.max_buckets` and `search.max_buckets_per_cluster` if the task involves heavy aggregation
    Increase `search.max_buckets` and `search.max_buckets_per_cluster` if the task involves heavy aggregation

中文步骤

  1. 通过API增加特定任务的超时时间,例如:`POST _tasks/cancel?actions=cluster:admin/reindex&timeout=2h`
  2. 通过减少批量大小优化任务:对于重新索引,设置`{"source": {"size": 500}, "dest": {"index": "new_index"}}`
  3. 如果任务涉及大量聚合,增加`search.max_buckets`和`search.max_buckets_per_cluster`

Dead Ends

Common approaches that don't work:

  1. Increasing task timeout in elasticsearch.yml (e.g., `task.timeout: 60m`) without analyzing actual task duration 70% fail

    This may mask underlying performance issues (e.g., slow disk, insufficient memory) and cause cascading failures.

  2. Restarting the cluster to clear all tasks 80% fail

    Restarting drops all ongoing tasks, but the error will reoccur if the root cause (e.g., slow shard recovery) is not addressed.

  3. Setting `task.timeout: 0` to disable timeout 90% fail

    This can lead to indefinite task hangs and resource exhaustion, as the cluster will never cancel stuck tasks.