K8S-LEADER-001 kubernetes system_error ai_generated true

选举:领导者选举丢失

Election: leader election lost

ID: kubernetes/leader-election-lost

其他格式: JSON · Markdown 中文 · English
80%修复率
85%置信度
1证据数
2023-06-15首次发现

版本兼容性

版本状态引入弃用备注
kubernetes 1.23 active
kubernetes 1.24 active
kubernetes 1.25 active
kubernetes 1.28 active

根因分析

控制器或操作器 Pod 因网络分区、Pod 重启或 etcd 超时而丢失租约锁,导致临时领导权空缺。

English

A controller or operator pod lost its lease lock due to network partition, pod restart, or etcd timeout, causing a temporary leadership gap.

generic

官方文档

https://kubernetes.io/docs/concepts/architecture/controller/

解决方案

  1. 将控制器 Deployment 缩容至 0,等待 30 秒,再扩容至 1,以强制进行干净的领导者选举。
  2. 检查可能阻止控制器副本之间在端口 2380(etcd 对等端口)上通信的网络策略或防火墙规则。

无效尝试

常见但无效的做法:

  1. Restart all replicas of the controller simultaneously. 65% 失败

    Restarting all replicas at once can cause a prolonged leader election storm, making the problem worse.

  2. Delete the lease object in etcd manually. 80% 失败

    Manually deleting the lease may cause data inconsistency and is not recommended; the leader election mechanism should self-heal.