communication runtime_error ai_generated true

CommitFailedError:由于消费者组已重新平衡并将分区分配给其他成员,无法完成提交

CommitFailedError: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member

ID: communication/kafka-consumer-commit-failed-rebalance

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2023-11-05首次发现

版本兼容性

版本状态引入弃用备注
Apache Kafka 3.4 active
Kafka 3.6 active
confluent-kafka-python 2.3 active
spring-kafka 3.0 active

根因分析

Kafka 消费者在消费者组重新平衡后尝试提交偏移量,通常是因为处理时间超过了 `max.poll.interval.ms`,导致消费者被从组中移除。

English

Kafka consumer attempted to commit offsets after a group rebalance had already occurred, often because processing time exceeded `max.poll.interval.ms`, causing the consumer to be removed from the group.

generic

官方文档

https://kafka.apache.org/documentation/#consumerconfigs_max.poll.interval.ms

解决方案

  1. 将 `max.poll.interval.ms` 增加到高于预期最大处理时间的值,例如在消费者配置中设置 `max.poll.interval.ms=600000`(10 分钟)。
  2. 通过异步处理减少每次轮询的处理时间:获取记录,在单独线程池中处理,并在所有处理完成后提交偏移量,例如使用 `enable.auto.commit=false` 和手动异步提交。
  3. 通过设置 `partition.assignment.strategy=CooperativeStickyAssignor` 实现协作式重新平衡(增量重新平衡协议),允许消费者在重新平衡期间保留部分分区。

无效尝试

常见但无效的做法:

  1. Increase `max.poll.records` to process more records per poll and reduce polling frequency 75% 失败

    Processing more records per poll increases processing time, exacerbating the rebalance issue.

  2. Disable auto-commit and commit offsets manually after every single record 65% 失败

    Frequent commits increase load and may still fail if a rebalance occurs between commits.

  3. Set `session.timeout.ms` to a very low value to detect failures faster 80% 失败

    This can cause unnecessary rebalances if consumers are healthy but take slightly longer to poll.