kubernetes system_error ai_generated true

错误:节点 'worker-node-1' 未找到 — kubelet 未发布节点状态

Error: node 'worker-node-1' not found — kubelet is not posting node status

ID: kubernetes/kubelet-node-status-notfound

其他格式: JSON · Markdown 中文 · English
82%修复率
88%置信度
1证据数
2023-04-05首次发现

版本兼容性

版本状态引入弃用备注
Kubernetes 1.24 active
Kubernetes 1.25 active
Kubernetes 1.26 active
kubeadm 1.25.0 active

根因分析

节点上的kubelet已停止向API服务器报告其状态,通常由于kubelet崩溃、网络断开或证书过期,导致节点被标记为NotReady或移除。

English

Kubelet on the node has stopped reporting its status to the API server, often due to kubelet crash, network disconnection, or certificate expiration, causing the node to be marked as NotReady or removed.

generic

官方文档

https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

解决方案

  1. SSH进入节点并检查kubelet状态:`systemctl status kubelet`。如果已停止,重启:`systemctl restart kubelet`。然后检查日志:`journalctl -u kubelet -n 50`。常见原因:证书过期(`openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout`)。如果过期,轮换kubelet证书:`kubeadm certs renew kubelet`并重启kubelet。
  2. 如果节点不可达,从集群中删除节点对象:`kubectl delete node worker-node-1`。然后使用正确的令牌通过`kubeadm join`重新加入节点。这强制进行新注册。

无效尝试

常见但无效的做法:

  1. 90% 失败

    Restarting the API server does not fix the node issue; the kubelet must be fixed on the node itself.

  2. 70% 失败

    Deleting and re-creating the node object in Kubernetes without fixing the kubelet will result in the same error because the new node will also fail to report status.