kafka runtime_error ai_generated true

org.apache.kafka.common.errors.ReplicaNotAvailableException: Replica for partition my_topic-0 is not available on broker 2

ID: kafka/replica-not-available-on-fetch

Also available as: JSON · Markdown · 中文
85%Fix Rate
87%Confidence
1Evidence
2023-08-01First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
kafka_2.13-3.4.0 active
kafka_2.13-3.5.1 active
kafka_2.13-3.6.0 active

Root Cause

A follower replica is not fully caught up with the leader and cannot serve fetch requests, often due to replication lag or the replica being offline.

generic

中文

跟随者副本未完全跟上领导者,无法提供获取请求,通常是由于复制延迟或副本离线。

Official Documentation

https://kafka.apache.org/documentation/#replication

Workarounds

  1. 85% success Check the replication lag using `kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group my-group --describe` and verify the replica is in sync. If lag is high, increase `replica.fetch.max.bytes` and `num.replica.fetchers` on the broker.
    Check the replication lag using `kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group my-group --describe` and verify the replica is in sync. If lag is high, increase `replica.fetch.max.bytes` and `num.replica.fetchers` on the broker.
  2. 80% success Restart the broker hosting the unavailable replica (broker 2) to force re-sync with the leader. Example: `systemctl restart kafka` on broker 2.
    Restart the broker hosting the unavailable replica (broker 2) to force re-sync with the leader. Example: `systemctl restart kafka` on broker 2.
  3. 90% success If the replica is permanently stuck, reassign the partition to a different broker using `kafka-reassign-partitions.sh` with a custom reassignment JSON.
    If the replica is permanently stuck, reassign the partition to a different broker using `kafka-reassign-partitions.sh` with a custom reassignment JSON.

中文步骤

  1. Check the replication lag using `kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group my-group --describe` and verify the replica is in sync. If lag is high, increase `replica.fetch.max.bytes` and `num.replica.fetchers` on the broker.
  2. Restart the broker hosting the unavailable replica (broker 2) to force re-sync with the leader. Example: `systemctl restart kafka` on broker 2.
  3. If the replica is permanently stuck, reassign the partition to a different broker using `kafka-reassign-partitions.sh` with a custom reassignment JSON.

Dead Ends

Common approaches that don't work:

  1. 85% fail

    This reduces durability but does not make the replica available; the follower still lags and cannot serve fetches.

  2. 80% fail

    This controls fetch size, not lag; the replica is not available due to being out of sync, not due to fetch limits.

  3. 95% fail

    This removes the replica entirely, causing data loss and requiring re-replication, which may make it even less available.