elasticsearch resource_error ai_generated true

ElasticsearchException: [index][0] recovery throttled due to max_bytes_per_sec [40mb]

ID: elasticsearch/max-bytes-per-sec-throttle

Also available as: JSON · Markdown · 中文
78%Fix Rate
85%Confidence
1Evidence
2024-06-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
elasticsearch 7.17.0 active
elasticsearch 8.10.0 active
elasticsearch 8.6.2 active

Root Cause

Shard recovery is throttled because the node's max_bytes_per_sec setting is too low for the current recovery load.

generic

中文

分片恢复被限流,因为节点的max_bytes_per_sec设置对于当前恢复负载过低。

Official Documentation

https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery.html

Workarounds

  1. 85% success Increase max_bytes_per_sec temporarily using dynamic cluster settings: PUT _cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "100mb" } }
    Increase max_bytes_per_sec temporarily using dynamic cluster settings: PUT _cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "100mb" } }
  2. 75% success Reduce concurrent recoveries by setting indices.recovery.concurrent_streams to 2 in elasticsearch.yml to throttle recovery parallelism.
    Reduce concurrent recoveries by setting indices.recovery.concurrent_streams to 2 in elasticsearch.yml to throttle recovery parallelism.
  3. 90% success Allocate more dedicated hot nodes to spread recovery load, then rebalance replicas.
    Allocate more dedicated hot nodes to spread recovery load, then rebalance replicas.

中文步骤

  1. 临时增加max_bytes_per_sec:PUT _cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "100mb" } }
  2. 减少并发恢复数:在elasticsearch.yml中设置indices.recovery.concurrent_streams为2以限制恢复并行度。
  3. 分配更多专用热节点以分散恢复负载,然后重新平衡副本。

Dead Ends

Common approaches that don't work:

  1. 45% fail

    This can saturate network bandwidth and cause other node failures, especially in clusters with many concurrent recoveries.

  2. 70% fail

    Restarting only resets the recovery process temporarily; throttling reapplies once recovery resumes.

  3. 65% fail

    Setting to 0 disables throttling but can overwhelm the node's I/O, leading to OOM or disk saturation.