ERM redis runtime_error ai_generated partial

SENTINEL: quorum not reached for failover

ID: redis/sentinel-quorum-failure

Also available as: JSON · Markdown · 中文
85%Fix Rate
87%Confidence
1Evidence
2024-03-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Redis 6.2.0 active
Redis 7.0.0 active
Redis 7.2.0 active

Root Cause

Sentinel nodes could not agree on a master failure due to network partitions or insufficient sentinel instances, preventing failover.

generic

中文

哨兵节点因网络分区或哨兵实例不足无法就主节点故障达成一致,阻止了故障转移。

Official Documentation

https://redis.io/docs/latest/operate/oss_and_stack/management/sentinel/

Workarounds

  1. 90% success Ensure at least 3 sentinel instances are running with quorum set to 2. Use SENTINEL SET <master-name> quorum 2 and verify connectivity with INFO sentinel.
    Ensure at least 3 sentinel instances are running with quorum set to 2. Use SENTINEL SET <master-name> quorum 2 and verify connectivity with INFO sentinel.
  2. 85% success Check network connectivity between sentinels: use redis-cli -h <sentinel-ip> -p 26379 PING to each sentinel. Fix firewall rules or DNS resolution.
    Check network connectivity between sentinels: use redis-cli -h <sentinel-ip> -p 26379 PING to each sentinel. Fix firewall rules or DNS resolution.
  3. 80% success If a sentinel is down, restart it individually and wait for it to rejoin the quorum. Monitor with SENTINEL MASTER <master-name>.
    If a sentinel is down, restart it individually and wait for it to rejoin the quorum. Monitor with SENTINEL MASTER <master-name>.

中文步骤

  1. Ensure at least 3 sentinel instances are running with quorum set to 2. Use SENTINEL SET <master-name> quorum 2 and verify connectivity with INFO sentinel.
  2. Check network connectivity between sentinels: use redis-cli -h <sentinel-ip> -p 26379 PING to each sentinel. Fix firewall rules or DNS resolution.
  3. If a sentinel is down, restart it individually and wait for it to rejoin the quorum. Monitor with SENTINEL MASTER <master-name>.

Dead Ends

Common approaches that don't work:

  1. 75% fail

    Manual failover bypasses sentinel checks and can cause split-brain if the original master recovers.

  2. 70% fail

    A high quorum makes failover harder to achieve, increasing downtime during real failures.

  3. 80% fail

    Simultaneous restart can cause all sentinels to lose state and delay failover; it does not fix network issues.