kafka runtime_error ai_generated true

org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rebalance is already in progress

ID: kafka/consumer-group-rebalance-timeout

Also available as: JSON · Markdown · 中文
82%Fix Rate
88%Confidence
1Evidence
2024-01-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Kafka 3.6.0 active
Kafka 3.7.0 active

Root Cause

Consumer group rebalance triggered while another rebalance is still in progress, typically due to slow consumer join times or network delays.

generic

中文

消费者组再均衡在另一个再均衡仍在进行时被触发,通常由消费者加入缓慢或网络延迟引起。

Official Documentation

https://kafka.apache.org/documentation/#consumer_rebalance

Workarounds

  1. 80% success Set `rebalance.timeout.ms=120000` in consumer config to allow more time for rebalance completion, and ensure `max.poll.records=500` to reduce processing load per poll.
    Set `rebalance.timeout.ms=120000` in consumer config to allow more time for rebalance completion, and ensure `max.poll.records=500` to reduce processing load per poll.
  2. 90% success Use static group membership by setting `group.instance.id` to a unique value per consumer to reduce rebalance frequency.
    Use static group membership by setting `group.instance.id` to a unique value per consumer to reduce rebalance frequency.
  3. 85% success Example fix in Java consumer: `props.put(ConsumerConfig.REBALANCE_TIMEOUT_MS_CONFIG, 120000); props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);`
    Example fix in Java consumer: `props.put(ConsumerConfig.REBALANCE_TIMEOUT_MS_CONFIG, 120000); props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);`

中文步骤

  1. Set `rebalance.timeout.ms=120000` in consumer config to allow more time for rebalance completion, and ensure `max.poll.records=500` to reduce processing load per poll.
  2. Use static group membership by setting `group.instance.id` to a unique value per consumer to reduce rebalance frequency.
  3. Example fix in Java consumer: `props.put(ConsumerConfig.REBALANCE_TIMEOUT_MS_CONFIG, 120000); props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 500);`

Dead Ends

Common approaches that don't work:

  1. 70% fail

    It only postpones the error; the rebalance will still fail if consumers are slow.

  2. 85% fail

    It exacerbates the problem by making consumers appear dead prematurely.