kafka runtime_error ai_generated true

org.apache.kafka.common.errors.StaleMemberEpochException: The member epoch 5 has been fenced by the group coordinator

ID: kafka/consumer-group-stale-metadata

Also available as: JSON · Markdown · 中文
82%Fix Rate
83%Confidence
1Evidence
2024-05-12First Seen

Root Cause

Consumer member's epoch (generation) is outdated because the group coordinator has rebalanced or the consumer's session timed out, leading to a fenced member.

generic

中文

消费者成员的纪元(代)已过期,因为组协调器已重新平衡或消费者的会话超时,导致成员被隔离。

Official Documentation

https://kafka.apache.org/documentation/#consumer_group_rebalancing

Workarounds

  1. 85% success Implement a consumer rebalance listener to detect rebalances and reset state. For Java consumers: `consumer.subscribe(Collections.singletonList(topic), new ConsumerRebalanceListener() { ... })`
    Implement a consumer rebalance listener to detect rebalances and reset state. For Java consumers: `consumer.subscribe(Collections.singletonList(topic), new ConsumerRebalanceListener() { ... })`
  2. 80% success Increase 'heartbeat.interval.ms' to a value lower than 'session.timeout.ms' (e.g., set heartbeat to 1000 ms and session timeout to 10000 ms) to ensure timely heartbeats and reduce session expiry.
    Increase 'heartbeat.interval.ms' to a value lower than 'session.timeout.ms' (e.g., set heartbeat to 1000 ms and session timeout to 10000 ms) to ensure timely heartbeats and reduce session expiry.
  3. 75% success If using static group membership, ensure 'group.instance.id' is unique per consumer and stable across restarts. Example: `props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "consumer-1");`
    If using static group membership, ensure 'group.instance.id' is unique per consumer and stable across restarts. Example: `props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "consumer-1");`

中文步骤

  1. Implement a consumer rebalance listener to detect rebalances and reset state. For Java consumers: `consumer.subscribe(Collections.singletonList(topic), new ConsumerRebalanceListener() { ... })`
  2. Increase 'heartbeat.interval.ms' to a value lower than 'session.timeout.ms' (e.g., set heartbeat to 1000 ms and session timeout to 10000 ms) to ensure timely heartbeats and reduce session expiry.
  3. If using static group membership, ensure 'group.instance.id' is unique per consumer and stable across restarts. Example: `props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "consumer-1");`

Dead Ends

Common approaches that don't work:

  1. 50% fail

    This can make the consumer unresponsive for longer periods, worsening rebalance issues and causing the group to stall; it does not prevent epoch fencing due to rebalances.

  2. 40% fail

    Static membership only prevents unnecessary rebalances during brief disconnections; if the consumer actually fails or is fenced, the epoch will still be stale.

  3. 70% fail

    This causes a full rebalance and may temporarily resolve the error, but the underlying cause (e.g., network issues, slow processing) remains, so the error will recur.