K8S-LEADER-001
kubernetes
system_error
ai_generated
true
Election: leader election lost
ID: kubernetes/leader-election-lost
80%Fix Rate
85%Confidence
1Evidence
2023-06-15First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| kubernetes 1.23 | active | — | — | — |
| kubernetes 1.24 | active | — | — | — |
| kubernetes 1.25 | active | — | — | — |
| kubernetes 1.28 | active | — | — | — |
Root Cause
A controller or operator pod lost its lease lock due to network partition, pod restart, or etcd timeout, causing a temporary leadership gap.
generic中文
控制器或操作器 Pod 因网络分区、Pod 重启或 etcd 超时而丢失租约锁,导致临时领导权空缺。
Official Documentation
https://kubernetes.io/docs/concepts/architecture/controller/Workarounds
-
85% success Scale down the controller deployment to 0, wait 30 seconds, then scale back up to 1 to force a clean leader election.
Scale down the controller deployment to 0, wait 30 seconds, then scale back up to 1 to force a clean leader election.
-
75% success Check network policies or firewall rules that may block communication between controller replicas on port 2380 (etcd peer port).
Check network policies or firewall rules that may block communication between controller replicas on port 2380 (etcd peer port).
中文步骤
将控制器 Deployment 缩容至 0,等待 30 秒,再扩容至 1,以强制进行干净的领导者选举。
检查可能阻止控制器副本之间在端口 2380(etcd 对等端口)上通信的网络策略或防火墙规则。
Dead Ends
Common approaches that don't work:
-
Restart all replicas of the controller simultaneously.
65% fail
Restarting all replicas at once can cause a prolonged leader election storm, making the problem worse.
-
Delete the lease object in etcd manually.
80% fail
Manually deleting the lease may cause data inconsistency and is not recommended; the leader election mechanism should self-heal.