database network_error ai_generated partial

redis.exceptions.MasterDownError: Error: Master is down or unreachable

ID: database/redis-master-link-down

Also available as: JSON · Markdown · 中文
85%Fix Rate
86%Confidence
1Evidence
2024-03-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Redis 6.2.x active
Redis 7.0.x active
Redis 7.2.x active

Root Cause

The Redis client cannot connect to the master node in a replication setup, often due to network partition, master crash, or misconfigured replicaof directive.

generic

中文

Redis 客户端无法连接到复制设置中的主节点,通常由于网络分区、主节点崩溃或 replicaof 指令配置错误。

Official Documentation

https://redis.io/docs/latest/operate/oss_and_stack/management/replication/

Workarounds

  1. 85% success Check master status from replica: redis-cli -h replica_host INFO replication | grep master_link_status; if down, check master: redis-cli -h master_host PING; restart master if needed: systemctl restart redis-server
    Check master status from replica: redis-cli -h replica_host INFO replication | grep master_link_status; if down, check master: redis-cli -h master_host PING; restart master if needed: systemctl restart redis-server
  2. 90% success Promote replica to master in a failover scenario: redis-cli -h replica_host SLAVEOF NO ONE; then reconfigure other replicas to point to the new master.
    Promote replica to master in a failover scenario: redis-cli -h replica_host SLAVEOF NO ONE; then reconfigure other replicas to point to the new master.

中文步骤

  1. 从副本检查主节点状态:redis-cli -h replica_host INFO replication | grep master_link_status;如果为 down,检查主节点:redis-cli -h master_host PING;如果需要,重启主节点:systemctl restart redis-server
  2. 在故障转移场景中将副本提升为主节点:redis-cli -h replica_host SLAVEOF NO ONE;然后重新配置其他副本指向新主节点。

Dead Ends

Common approaches that don't work:

  1. Restarting only the replica node 90% fail

    If the master is down, restarting the replica does not fix the connection; the replica will still fail to sync.

  2. Increasing replica timeout in redis.conf without checking master health 70% fail

    Increasing timeout only delays the error; it does not address the root cause of master unavailability.