ERM redis system_error ai_generated true

Error: Replication backlog buffer overflow, disconnecting replica

ID: redis/replica-repl-backlog-overflow

Also available as: JSON · Markdown · 中文
85%Fix Rate
88%Confidence
1Evidence
2024-01-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Redis 6.2.6 active
Redis 7.0.12 active
Redis 7.2.4 active

Root Cause

The replication backlog buffer on the primary node exceeded its configured size, causing the node to disconnect replicas to prevent memory exhaustion.

generic

中文

主节点上的复制积压缓冲区超过配置大小,导致节点断开副本连接以防止内存耗尽。

Official Documentation

https://redis.io/docs/latest/operate/oss_and_stack/replication/

Workarounds

  1. 90% success Increase the replication backlog size: CONFIG SET repl-backlog-size 100mb. This provides more buffer for replicas to catch up.
    Increase the replication backlog size: CONFIG SET repl-backlog-size 100mb. This provides more buffer for replicas to catch up.
  2. 80% success Add more replicas or improve network bandwidth to reduce replication lag. Monitor with INFO replication.
    Add more replicas or improve network bandwidth to reduce replication lag. Monitor with INFO replication.
  3. 85% success Enable client output buffer limits for replicas: CONFIG SET client-output-buffer-limit replica 256mb 64mb 60. This prevents a single slow replica from overwhelming the backlog.
    Enable client output buffer limits for replicas: CONFIG SET client-output-buffer-limit replica 256mb 64mb 60. This prevents a single slow replica from overwhelming the backlog.

中文步骤

  1. Increase the replication backlog size: CONFIG SET repl-backlog-size 100mb. This provides more buffer for replicas to catch up.
  2. Add more replicas or improve network bandwidth to reduce replication lag. Monitor with INFO replication.
  3. Enable client output buffer limits for replicas: CONFIG SET client-output-buffer-limit replica 256mb 64mb 60. This prevents a single slow replica from overwhelming the backlog.

Dead Ends

Common approaches that don't work:

  1. 80% fail

    Setting backlog size to zero disables partial resynchronization, forcing full syncs on each reconnect, increasing network load.

  2. 75% fail

    Restarting clears the backlog temporarily but does not address the root cause of high write volume or slow replicas.

  3. 50% fail

    Large keys are not the direct cause; the backlog overflow is due to accumulated write commands, not key size.