K8S-LEADER-001 kubernetes system_error ai_generated true

Election: leader election lost

ID: kubernetes/leader-election-lost

Also available as: JSON · Markdown · 中文

80%Fix Rate

85%Confidence

1Evidence

2023-06-15First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
kubernetes 1.23	active	—	—	—
kubernetes 1.24	active	—	—	—
kubernetes 1.25	active	—	—	—
kubernetes 1.28	active	—	—	—

A controller or operator pod lost its lease lock due to network partition, pod restart, or etcd timeout, causing a temporary leadership gap.

generic

控制器或操作器 Pod 因网络分区、Pod 重启或 etcd 超时而丢失租约锁，导致临时领导权空缺。

85% success Scale down the controller deployment to 0, wait 30 seconds, then scale back up to 1 to force a clean leader election.
```
Scale down the controller deployment to 0, wait 30 seconds, then scale back up to 1 to force a clean leader election.
```
75% success Check network policies or firewall rules that may block communication between controller replicas on port 2380 (etcd peer port).
```
Check network policies or firewall rules that may block communication between controller replicas on port 2380 (etcd peer port).
```

将控制器 Deployment 缩容至 0，等待 30 秒，再扩容至 1，以强制进行干净的领导者选举。

检查可能阻止控制器副本之间在端口 2380（etcd 对等端口）上通信的网络策略或防火墙规则。

Common approaches that don't work:

Restart all replicas of the controller simultaneously. 65% fail
Restarting all replicas at once can cause a prolonged leader election storm, making the problem worse.
Delete the lease object in etcd manually. 80% fail
Manually deleting the lease may cause data inconsistency and is not recommended; the leader election mechanism should self-heal.