ERM redis runtime_error ai_generated partial

ERR Slot migration timed out for slot 1234

ID: redis/cluster-slot-migration-timeout

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2023-08-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Redis 6.2.0 active
Redis 7.0.0 active
Redis 7.2.0 active

Root Cause

A cluster slot migration operation exceeded the configured timeout, typically due to network congestion or large key migration.

generic

中文

集群槽迁移操作超过配置的超时时间,通常由网络拥塞或大键迁移导致。

Official Documentation

https://redis.io/docs/latest/operate/oss_and_stack/management/cluster/

Workarounds

  1. 85% success Use CLUSTER SETSLOT MIGRATING to abort the migration and retry with smaller batches. Example: redis-cli -h source-node CLUSTER SETSLOT 1234 MIGRATING destination-node-id; then use redis-cli --cluster reshard with --pipeline 10 to limit key count per migration.
    Use CLUSTER SETSLOT MIGRATING to abort the migration and retry with smaller batches. Example: redis-cli -h source-node CLUSTER SETSLOT 1234 MIGRATING destination-node-id; then use redis-cli --cluster reshard with --pipeline 10 to limit key count per migration.
  2. 75% success Increase cluster migration timeout via config: CONFIG SET cluster-migration-timeout 60000 (60 seconds) and retry the migration.
    Increase cluster migration timeout via config: CONFIG SET cluster-migration-timeout 60000 (60 seconds) and retry the migration.
  3. 80% success Identify and split large keys (e.g., >10MB) before migration to avoid timeout.
    Identify and split large keys (e.g., >10MB) before migration to avoid timeout.

中文步骤

  1. Use CLUSTER SETSLOT MIGRATING to abort the migration and retry with smaller batches. Example: redis-cli -h source-node CLUSTER SETSLOT 1234 MIGRATING destination-node-id; then use redis-cli --cluster reshard with --pipeline 10 to limit key count per migration.
  2. Increase cluster migration timeout via config: CONFIG SET cluster-migration-timeout 60000 (60 seconds) and retry the migration.
  3. Identify and split large keys (e.g., >10MB) before migration to avoid timeout.

Dead Ends

Common approaches that don't work:

  1. 70% fail

    Restarting nodes without addressing the underlying migration issue can cause data inconsistency and longer downtime.

  2. 90% fail

    Deleting slot data breaks cluster integrity and leads to data loss; the slot must be properly reassigned.

  3. 60% fail

    A very high timeout masks the problem and may lead to long stalls; it does not fix the root cause like large keys.