elasticsearch resource_error ai_generated true

EsRejectedExecutionException: rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, source=bulk]

ID: elasticsearch/too-many-requests-bulk-queue

Also available as: JSON · Markdown · 中文
80%Fix Rate
90%Confidence
1Evidence
2024-11-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
elasticsearch 7.17 active
elasticsearch 8.10 active
elasticsearch 8.12 active

Root Cause

The bulk queue on the coordinating node is full due to high indexing throughput, causing new bulk requests to be rejected.

generic

中文

由于高索引吞吐量,协调节点上的批量队列已满,导致新的批量请求被拒绝。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/rejected-execution.html

Workarounds

  1. 85% success Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
    Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
  2. 75% success Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
    Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
  3. 80% success Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.
    Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.

中文步骤

  1. 在客户端实现指数退避重试:例如,使用 elasticsearch-py:from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
  2. 临时增加批量队列大小:PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
  3. 通过添加更多节点或增加堆大小来扩展协调节点,以处理更高的吞吐量。

Dead Ends

Common approaches that don't work:

  1. 55% fail

    Large queue sizes can lead to high memory usage and increased latency, potentially causing OOM or degraded performance.

  2. 75% fail

    Without retries, bulk requests are lost permanently, leading to data loss and incomplete indexing.

  3. 80% fail

    Fewer nodes can increase per-node load and worsen the queue pressure, making the error more frequent.