K8S-LEADER-001
kubernetes
system_error
ai_generated
true
选举:领导者选举丢失
Election: leader election lost
ID: kubernetes/leader-election-lost
80%修复率
85%置信度
1证据数
2023-06-15首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| kubernetes 1.23 | active | — | — | — |
| kubernetes 1.24 | active | — | — | — |
| kubernetes 1.25 | active | — | — | — |
| kubernetes 1.28 | active | — | — | — |
根因分析
控制器或操作器 Pod 因网络分区、Pod 重启或 etcd 超时而丢失租约锁,导致临时领导权空缺。
English
A controller or operator pod lost its lease lock due to network partition, pod restart, or etcd timeout, causing a temporary leadership gap.
官方文档
https://kubernetes.io/docs/concepts/architecture/controller/解决方案
-
将控制器 Deployment 缩容至 0,等待 30 秒,再扩容至 1,以强制进行干净的领导者选举。
-
检查可能阻止控制器副本之间在端口 2380(etcd 对等端口)上通信的网络策略或防火墙规则。
无效尝试
常见但无效的做法:
-
Restart all replicas of the controller simultaneously.
65% 失败
Restarting all replicas at once can cause a prolonged leader election storm, making the problem worse.
-
Delete the lease object in etcd manually.
80% 失败
Manually deleting the lease may cause data inconsistency and is not recommended; the leader election mechanism should self-heal.