elasticsearch
resource_error
ai_generated
true
EsRejectedExecutionException: rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, source=bulk]
ID: elasticsearch/too-many-requests-bulk-queue
80%Fix Rate
90%Confidence
1Evidence
2024-11-05First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| elasticsearch 7.17 | active | — | — | — |
| elasticsearch 8.10 | active | — | — | — |
| elasticsearch 8.12 | active | — | — | — |
Root Cause
The bulk queue on the coordinating node is full due to high indexing throughput, causing new bulk requests to be rejected.
generic中文
由于高索引吞吐量,协调节点上的批量队列已满,导致新的批量请求被拒绝。
Official Documentation
https://www.elastic.co/guide/en/elasticsearch/reference/current/rejected-execution.htmlWorkarounds
-
85% success Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
-
75% success Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } } -
80% success Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.
Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.
中文步骤
在客户端实现指数退避重试:例如,使用 elasticsearch-py:from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
临时增加批量队列大小:PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }通过添加更多节点或增加堆大小来扩展协调节点,以处理更高的吞吐量。
Dead Ends
Common approaches that don't work:
-
55% fail
Large queue sizes can lead to high memory usage and increased latency, potentially causing OOM or degraded performance.
-
75% fail
Without retries, bulk requests are lost permanently, leading to data loss and incomplete indexing.
-
80% fail
Fewer nodes can increase per-node load and worsen the queue pressure, making the error more frequent.