elasticsearch resource_error ai_generated true

EsRejectedExecutionException: rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, source=bulk]

ID: elasticsearch/too-many-requests-bulk-queue

Also available as: JSON · Markdown · 中文

80%Fix Rate

90%Confidence

1Evidence

2024-11-05First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
elasticsearch 7.17	active	—	—	—
elasticsearch 8.10	active	—	—	—
elasticsearch 8.12	active	—	—	—

Root Cause

The bulk queue on the coordinating node is full due to high indexing throughput, causing new bulk requests to be rejected.

generic

中文

由于高索引吞吐量，协调节点上的批量队列已满，导致新的批量请求被拒绝。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/rejected-execution.html

Workarounds

85% success Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
```
Implement exponential backoff retry in the client: for example, in Python using elasticsearch-py: from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)
```
75% success Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
```
Increase the bulk queue size temporarily: PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }
```
80% success Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.
```
Scale up the coordinating nodes by adding more nodes or increasing their heap size to handle higher throughput.
```

中文步骤

在客户端实现指数退避重试：例如，使用 elasticsearch-py：from elasticsearch import Elasticsearch; from time import sleep; es = Elasticsearch(); for attempt in range(5): try: es.bulk(body=docs); break except Exception as e: sleep(2 ** attempt)

临时增加批量队列大小：PUT _cluster/settings { "transient": { "thread_pool.bulk.queue_size": 2000 } }

通过添加更多节点或增加堆大小来扩展协调节点，以处理更高的吞吐量。

Dead Ends

Common approaches that don't work:

55% fail
Large queue sizes can lead to high memory usage and increased latency, potentially causing OOM or degraded performance.
75% fail
Without retries, bulk requests are lost permanently, leading to data loss and incomplete indexing.
80% fail
Fewer nodes can increase per-node load and worsen the queue pressure, making the error more frequent.