communication runtime_error ai_generated true

CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member

ID: communication/kafka-consumer-commit-failed-rebalance

Also available as: JSON · Markdown · 中文

85%Fix Rate

88%Confidence

1Evidence

2023-11-05First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
Apache Kafka 3.4	active	—	—	—
Kafka 3.6	active	—	—	—
confluent-kafka-python 2.3	active	—	—	—
spring-kafka 3.0	active	—	—	—

Root Cause

Kafka consumer attempted to commit offsets after a group rebalance had already occurred, often because processing time exceeded `max.poll.interval.ms`, causing the consumer to be removed from the group.

generic

中文

Kafka 消费者在消费者组重新平衡后尝试提交偏移量，通常是因为处理时间超过了 `max.poll.interval.ms`，导致消费者被从组中移除。

Official Documentation

https://kafka.apache.org/documentation/#consumerconfigs_max.poll.interval.ms

Workarounds

90% success Increase `max.poll.interval.ms` to a value higher than the expected maximum processing time, e.g., `max.poll.interval.ms=600000` (10 minutes) in consumer config.
```
Increase `max.poll.interval.ms` to a value higher than the expected maximum processing time, e.g., `max.poll.interval.ms=600000` (10 minutes) in consumer config.
```
85% success Reduce processing time per poll by using asynchronous processing: fetch records, process in a separate thread pool, and commit offsets only after all processing completes, e.g., using `KafkaConsumer` with `enable.auto.commit=false` and manual async commits.
```
Reduce processing time per poll by using asynchronous processing: fetch records, process in a separate thread pool, and commit offsets only after all processing completes, e.g., using `KafkaConsumer` with `enable.auto.commit=false` and manual async commits.
```
80% success Implement cooperative rebalancing (incremental rebalance protocol) by setting `partition.assignment.strategy=CooperativeStickyAssignor`, which allows consumers to retain some partitions during rebalance.
```
Implement cooperative rebalancing (incremental rebalance protocol) by setting `partition.assignment.strategy=CooperativeStickyAssignor`, which allows consumers to retain some partitions during rebalance.
```

中文步骤

将 `max.poll.interval.ms` 增加到高于预期最大处理时间的值，例如在消费者配置中设置 `max.poll.interval.ms=600000`（10 分钟）。

通过异步处理减少每次轮询的处理时间：获取记录，在单独线程池中处理，并在所有处理完成后提交偏移量，例如使用 `enable.auto.commit=false` 和手动异步提交。

通过设置 `partition.assignment.strategy=CooperativeStickyAssignor` 实现协作式重新平衡（增量重新平衡协议），允许消费者在重新平衡期间保留部分分区。

Dead Ends

Common approaches that don't work:

Increase `max.poll.records` to process more records per poll and reduce polling frequency 75% fail
Processing more records per poll increases processing time, exacerbating the rebalance issue.
Disable auto-commit and commit offsets manually after every single record 65% fail
Frequent commits increase load and may still fail if a rebalance occurs between commits.
Set `session.timeout.ms` to a very low value to detect failures faster 80% fail
This can cause unnecessary rebalances if consumers are healthy but take slightly longer to poll.