ERR
redis
network_error
ai_generated
partial
ERR CLUSTERDOWN The cluster is down - node timeout, replica not synced
ID: redis/cluster-node-timeout-replica
82%Fix Rate
87%Confidence
1Evidence
2024-01-18First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| 7.2.0 | active | — | — | — |
| 7.4.0 | active | — | — | — |
| 8.0.0 | active | — | — | — |
Root Cause
A cluster node timed out and its replica is not fully synced, causing the cluster to lose quorum and mark itself as down.
generic中文
集群节点超时且其副本未完全同步,导致集群失去法定人数并将自身标记为关闭。
Official Documentation
https://redis.io/docs/manual/scaling/Workarounds
-
80% success Force the replica to sync: CLUSTER REPLICATE <master-node-id>. Then wait for replication to complete using CLUSTER INFO.
Force the replica to sync: CLUSTER REPLICATE <master-node-id>. Then wait for replication to complete using CLUSTER INFO.
-
85% success If the master is permanently down, promote the replica to master: CLUSTER FAILOVER FORCE on the replica node.
If the master is permanently down, promote the replica to master: CLUSTER FAILOVER FORCE on the replica node.
-
70% success Increase cluster-node-timeout to a higher value (e.g., 30000ms) to tolerate transient network issues: CONFIG SET cluster-node-timeout 30000.
Increase cluster-node-timeout to a higher value (e.g., 30000ms) to tolerate transient network issues: CONFIG SET cluster-node-timeout 30000.
中文步骤
强制副本同步:CLUSTER REPLICATE <主节点ID>。然后使用 CLUSTER INFO 等待复制完成。
如果主节点永久关闭,将副本提升为主节点:在副本节点上执行 CLUSTER FAILOVER FORCE。
增加 cluster-node-timeout 到更高值(例如 30000ms)以容忍临时网络问题:CONFIG SET cluster-node-timeout 30000。
Dead Ends
Common approaches that don't work:
-
70% fail
Restarting the failed node without fixing the replica sync will cause the same timeout again because the replica is still behind.
-
50% fail
Increasing cluster-node-timeout alone without addressing network issues or replica sync lag will not prevent future timeouts.