kafka network_error ai_generated partial

org.apache.kafka.common.errors.FetchTimeoutException: Fetch request timed out after 30000 ms for partition my-topic-0

ID: kafka/consumer-fetch-timeout

Also available as: JSON · Markdown · 中文
75%Fix Rate
82%Confidence
1Evidence
2023-11-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
kafka 3.0.0 active
kafka 3.3.0 active
kafka 3.5.0 active

Root Cause

The consumer's fetch request to the broker exceeded the configured timeout due to network congestion, broker overload, or a slow partition leader.

generic

中文

由于网络拥塞、代理过载或分区领导者缓慢,消费者对代理的获取请求超过了配置的超时时间。

Official Documentation

https://kafka.apache.org/documentation/#consumerconfigs_fetch.max.wait.ms

Workarounds

  1. 80% success Increase the broker's 'fetch.purgatory.purge.interval.requests' and tune 'num.network.threads' to handle more concurrent fetch requests, reducing queue delays.
    Increase the broker's 'fetch.purgatory.purge.interval.requests' and tune 'num.network.threads' to handle more concurrent fetch requests, reducing queue delays.
  2. 85% success Add a retry mechanism in the consumer with exponential backoff on FetchTimeoutException, and reduce fetch.min.bytes to allow smaller, faster responses.
    Add a retry mechanism in the consumer with exponential backoff on FetchTimeoutException, and reduce fetch.min.bytes to allow smaller, faster responses.

中文步骤

  1. Increase the broker's 'fetch.purgatory.purge.interval.requests' and tune 'num.network.threads' to handle more concurrent fetch requests, reducing queue delays.
  2. Add a retry mechanism in the consumer with exponential backoff on FetchTimeoutException, and reduce fetch.min.bytes to allow smaller, faster responses.

Dead Ends

Common approaches that don't work:

  1. 40% fail

    This only masks the timeout and can lead to increased consumer latency and backlog, without fixing the underlying network or broker issue.

  2. 70% fail

    Restarting re-establishes the connection but does not resolve persistent network or broker load issues; the timeout will recur.