kafka system_error ai_generated partial

LogDirOfflineException: One or more log directories are offline.

ID: kafka/log-dir-offline

Also available as: JSON · Markdown · 中文
70%Fix Rate
83%Confidence
1Evidence
2024-04-02First Seen

Root Cause

A disk failure or filesystem issue has caused one or more Kafka data directories to become inaccessible, leading to broker unavailability for those partitions.

generic

中文

磁盘故障或文件系统问题导致一个或多个 Kafka 数据目录无法访问,从而导致这些分区的代理不可用。

Official Documentation

https://kafka.apache.org/documentation/#log_dirs

Workarounds

  1. 70% success Identify the offline directory from broker logs: `grep 'offline' /var/log/kafka/server.log`. Then unmount and check the disk with `fsck`, or replace it. After repair, restart the broker. Example: `sudo umount /data/kafka && sudo fsck -y /dev/sdb1 && sudo mount /data/kafka && kafka-server-start.sh config/server.properties`.
    Identify the offline directory from broker logs: `grep 'offline' /var/log/kafka/server.log`. Then unmount and check the disk with `fsck`, or replace it. After repair, restart the broker. Example: `sudo umount /data/kafka && sudo fsck -y /dev/sdb1 && sudo mount /data/kafka && kafka-server-start.sh config/server.properties`.

中文步骤

  1. Identify the offline directory from broker logs: `grep 'offline' /var/log/kafka/server.log`. Then unmount and check the disk with `fsck`, or replace it. After repair, restart the broker. Example: `sudo umount /data/kafka && sudo fsck -y /dev/sdb1 && sudo mount /data/kafka && kafka-server-start.sh config/server.properties`.

Dead Ends

Common approaches that don't work:

  1. 90% fail

    Simply restarting the broker without addressing the disk failure will cause the same error; the broker will detect the offline directory again.

  2. 95% fail

    Increasing log retention or cleanup policies doesn't fix a hardware failure; the underlying disk must be repaired or replaced.