kafka runtime_error ai_generated true

org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rebalance is already in progress.

ID: kafka/group-rebalance-timeout

Also available as: JSON · Markdown · 中文
75%Fix Rate
82%Confidence
1Evidence
2023-03-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Kafka 2.8.0 active
Kafka 3.0.0 active
Kafka 3.4.0 active
Kafka 3.6.0 active

Root Cause

A consumer request (like offset commit or join group) was made while a consumer group rebalance was already in progress, causing the request to be rejected.

generic

中文

在消费者组重新平衡正在进行时,消费者请求(如偏移提交或加入组)被发出,导致请求被拒绝。

Official Documentation

https://kafka.apache.org/documentation/#consumerconfigs_max.poll.interval.ms

Workarounds

  1. 80% success Increase max.poll.interval.ms to allow more time for processing between polls, reducing the chance of rebalance being triggered: Properties props = new Properties(); props.put("max.poll.interval.ms", 600000); // 10 minutes props.put("max.poll.records", 500); // Fewer records per poll And ensure the consumer processes records quickly or uses async processing.
    Increase max.poll.interval.ms to allow more time for processing between polls, reducing the chance of rebalance being triggered:
    
    Properties props = new Properties();
    props.put("max.poll.interval.ms", 600000);  // 10 minutes
    props.put("max.poll.records", 500);  // Fewer records per poll
    
    And ensure the consumer processes records quickly or uses async processing.
  2. 75% success Handle RebalanceInProgressException in the consumer loop by catching the exception and retrying after a short delay: try { consumer.commitSync(); } catch (RebalanceInProgressException e) { // Wait for rebalance to complete Thread.sleep(1000); consumer.poll(Duration.ofSeconds(1)); // Trigger rebalance join consumer.commitSync(); }
    Handle RebalanceInProgressException in the consumer loop by catching the exception and retrying after a short delay:
    
    try {
        consumer.commitSync();
    } catch (RebalanceInProgressException e) {
        // Wait for rebalance to complete
        Thread.sleep(1000);
        consumer.poll(Duration.ofSeconds(1));  // Trigger rebalance join
        consumer.commitSync();
    }

中文步骤

  1. Increase max.poll.interval.ms to allow more time for processing between polls, reducing the chance of rebalance being triggered:
    
    Properties props = new Properties();
    props.put("max.poll.interval.ms", 600000);  // 10 minutes
    props.put("max.poll.records", 500);  // Fewer records per poll
    
    And ensure the consumer processes records quickly or uses async processing.
  2. Handle RebalanceInProgressException in the consumer loop by catching the exception and retrying after a short delay:
    
    try {
        consumer.commitSync();
    } catch (RebalanceInProgressException e) {
        // Wait for rebalance to complete
        Thread.sleep(1000);
        consumer.poll(Duration.ofSeconds(1));  // Trigger rebalance join
        consumer.commitSync();
    }

Dead Ends

Common approaches that don't work:

  1. Increase the session.timeout.ms to a very high value 80% fail

    Session timeout controls heartbeat detection, not rebalance duration; a high value may delay failure detection but does not prevent rebalance conflicts.

  2. Set the consumer to use static group membership 70% fail

    Static membership reduces rebalance frequency but does not eliminate it; rebalance can still be triggered by coordinator changes or partition reassignments.