kubernetes resource_error ai_generated true

Pod status: Evicted — The node was low on resource: disk-pressure

ID: kubernetes/pod-evicted-due-to-disk-pressure

Also available as: JSON · Markdown · 中文
88%Fix Rate
90%Confidence
1Evidence
2023-09-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
kubernetes 1.26 active
kubernetes 1.27 active
kubernetes 1.28 active

Root Cause

The node's disk usage exceeded a threshold (e.g., 85% or 90%), triggering the kubelet to evict pods to free space.

generic

中文

节点的磁盘使用率超过阈值(例如 85% 或 90%),触发 kubelet 驱逐 Pod 以释放空间。

Official Documentation

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/

Workarounds

  1. 90% success SSH into the node and run `df -h` to identify full partitions. Clean up by removing unused container images: `docker system prune -a` or `crictl rmi --prune`. Also check and remove old logs: `journalctl --vacuum-size=500M`.
    SSH into the node and run `df -h` to identify full partitions. Clean up by removing unused container images: `docker system prune -a` or `crictl rmi --prune`. Also check and remove old logs: `journalctl --vacuum-size=500M`.
  2. 85% success Use node affinity or taints to move the workload to a node with sufficient disk: `kubectl taint nodes node1 disk-pressure=true:NoSchedule` then reschedule the pod on another node.
    Use node affinity or taints to move the workload to a node with sufficient disk: `kubectl taint nodes node1 disk-pressure=true:NoSchedule` then reschedule the pod on another node.

中文步骤

  1. SSH 到节点并运行 `df -h` 识别满的分区。通过删除未使用的容器镜像清理:`docker system prune -a` 或 `crictl rmi --prune`。同时检查并删除旧日志:`journalctl --vacuum-size=500M`。
  2. 使用节点亲和性或污点将工作负载移动到具有足够磁盘的节点:`kubectl taint nodes node1 disk-pressure=true:NoSchedule` 然后将 Pod 重新调度到另一个节点。

Dead Ends

Common approaches that don't work:

  1. 95% fail

    The pod will be evicted again immediately if the node's disk pressure persists.

  2. 60% fail

    Raising thresholds can lead to node instability and data loss; it only delays the problem.

  3. 90% fail

    Restarting kubelet doesn't free disk space; the underlying disk usage issue remains.