communication
runtime_error
ai_generated
true
CommitFailedError:由于消费者组已重新平衡并将分区分配给其他成员,无法完成提交
CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member
ID: communication/kafka-consumer-commit-failed-rebalance
85%修复率
88%置信度
1证据数
2023-11-05首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| Apache Kafka 3.4 | active | — | — | — |
| Kafka 3.6 | active | — | — | — |
| confluent-kafka-python 2.3 | active | — | — | — |
| spring-kafka 3.0 | active | — | — | — |
根因分析
Kafka 消费者在消费者组重新平衡后尝试提交偏移量,通常是因为处理时间超过了 `max.poll.interval.ms`,导致消费者被从组中移除。
English
Kafka consumer attempted to commit offsets after a group rebalance had already occurred, often because processing time exceeded `max.poll.interval.ms`, causing the consumer to be removed from the group.
官方文档
https://kafka.apache.org/documentation/#consumerconfigs_max.poll.interval.ms解决方案
-
将 `max.poll.interval.ms` 增加到高于预期最大处理时间的值,例如在消费者配置中设置 `max.poll.interval.ms=600000`(10 分钟)。
-
通过异步处理减少每次轮询的处理时间:获取记录,在单独线程池中处理,并在所有处理完成后提交偏移量,例如使用 `enable.auto.commit=false` 和手动异步提交。
-
通过设置 `partition.assignment.strategy=CooperativeStickyAssignor` 实现协作式重新平衡(增量重新平衡协议),允许消费者在重新平衡期间保留部分分区。
无效尝试
常见但无效的做法:
-
Increase `max.poll.records` to process more records per poll and reduce polling frequency
75% 失败
Processing more records per poll increases processing time, exacerbating the rebalance issue.
-
Disable auto-commit and commit offsets manually after every single record
65% 失败
Frequent commits increase load and may still fail if a rebalance occurs between commits.
-
Set `session.timeout.ms` to a very low value to detect failures faster
80% 失败
This can cause unnecessary rebalances if consumers are healthy but take slightly longer to poll.