kubernetes
system_error
ai_generated
partial
etcdserver:请求超时,可能正在进行领导者选举
etcdserver: request timed out, possible leader election
ID: kubernetes/etcd-leader-election-failure
70%修复率
88%置信度
1证据数
2023-09-05首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| etcd 3.5.7 | active | — | — | — |
| etcd 3.5.9 | active | — | — | — |
| Kubernetes 1.27 | active | — | — | — |
| Kubernetes 1.29 | active | — | — | — |
根因分析
etcd 集群遇到网络分区或磁盘 I/O 延迟,导致领导者选举失败或耗时过长,从而导致 Kubernetes API 请求超时。
English
etcd cluster is experiencing network partition or disk I/O latency, causing leader election to fail or take too long, resulting in timeouts for Kubernetes API requests.
官方文档
https://etcd.io/docs/v3.5/faq/#what-does-request-timed-out-mean解决方案
-
Check etcd cluster health: `etcdctl endpoint health --cluster`. Identify unhealthy members and check their disk I/O with `iostat -x 1` or network latency with `ping` between etcd nodes.
-
If disk I/O is high, move etcd data directory to a faster disk (e.g., SSD) by updating the etcd pod spec's hostPath or using a dedicated volume: `--data-dir=/var/lib/etcd-ssd`.
-
If network partition is suspected, ensure all etcd members can communicate on port 2380 (peer communication). Check firewall rules and network policies.
无效尝试
常见但无效的做法:
-
70% 失败
Simply restarting one etcd member may worsen the situation by triggering another leader election.
-
60% 失败
Increasing etcd request timeout without fixing underlying disk or network issues only masks the problem temporarily.