DiskPressure cloud resource_error ai_generated true

PodEviction:节点处于条件:[DiskPressure]。

PodEviction: The node had condition: [DiskPressure].

ID: cloud/azure-aks-pod-eviction-disk-pressure

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2024-01-10首次发现

版本兼容性

版本状态引入弃用备注
AKS 1.28 active
AKS 1.29 active
Kubernetes 1.27 active
Azure Linux Node Image 2024 active

根因分析

AKS节点的本地磁盘使用率超过85%阈值,触发kubelet驱逐Pod以释放空间,通常是由于容器日志、镜像或emptyDir卷填满了OS磁盘。

English

AKS node's local disk usage exceeds 85% threshold, triggering kubelet to evict pods to free space, often due to container logs, images, or emptyDir volumes filling up the OS disk.

generic

官方文档

https://learn.microsoft.com/en-us/azure/aks/manage-disk-usage

解决方案

  1. Evict pods gracefully and clean up unused container images on the node: `kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data`, then SSH into node and run `docker system prune -a --force` or `nerdctl system prune -a --force` (depending on container runtime). Uncordon node: `kubectl uncordon <node-name>`.
  2. Configure log rotation for container logs by setting up a DaemonSet with fluent-bit or filebeat to ship logs and limit local storage, or adjust kubelet config: `--container-log-max-size=10Mi --container-log-max-files=3`.

无效尝试

常见但无效的做法:

  1. 95% 失败

    Pods are recreated by controllers and the node remains under DiskPressure, causing immediate re-eviction.

  2. 80% 失败

    New nodes may also fill up quickly if the root cause (e.g., log rotation misconfiguration) is not fixed; existing node remains under pressure.

  3. 90% 失败

    Restarting kubelet does not free disk space; it only temporarily resets the pressure condition until usage rechecked.