# GKE node pool upgrade failed: Node had condition DiskPressure during upgrade

- **ID:** `cloud/gcp-kubernetes-node-pool-upgrade-failed-disk-pressure`
- **Domain:** cloud
- **Category:** system_error
- **Error Code:** `GKE.NodePoolUpgrade.DiskPressure`
- **Verification:** ai_generated
- **Fix Rate:** 75%

## Root Cause

During a GKE node pool upgrade, nodes may fail to drain because local ephemeral storage (e.g., container images, logs) exceeds the node's disk capacity, causing the kubelet to report DiskPressure and prevent pod eviction.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| GKE: >= 1.27 | active | — | — |
| Kubernetes: >= 1.24 | active | — | — |
| Google Cloud SDK: >= 400.0.0 | active | — | — |

## Workarounds

1. **Manually evict pods from the problematic node using 'kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data' after clearing disk space by removing unused container images with 'docker image prune -a' or 'crictl rmi --prune' on the node (via SSH).** (75% success)
   ```
   Manually evict pods from the problematic node using 'kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data' after clearing disk space by removing unused container images with 'docker image prune -a' or 'crictl rmi --prune' on the node (via SSH).
   ```
2. **Configure a PodDisruptionBudget for critical workloads and ensure node disk usage is below 80% before starting the upgrade. Use monitoring to check disk usage: 'kubectl describe node <node-name> | grep DiskPressure'.** (80% success)
   ```
   Configure a PodDisruptionBudget for critical workloads and ensure node disk usage is below 80% before starting the upgrade. Use monitoring to check disk usage: 'kubectl describe node <node-name> | grep DiskPressure'.
   ```
3. **Use a node pool with local SSDs or larger boot disks (e.g., 100 GB) to provide more ephemeral storage, and enable 'gcplogs' with log rotation to prevent log buildup.** (85% success)
   ```
   Use a node pool with local SSDs or larger boot disks (e.g., 100 GB) to provide more ephemeral storage, and enable 'gcplogs' with log rotation to prevent log buildup.
   ```

## Dead Ends

- **** — Force deletion can cause data loss for stateful workloads and doesn't clean up the underlying disk pressure issue on new nodes. (60% fail)
- **** — Resizing doesn't take effect until the node is recreated; during upgrade, the old node still has DiskPressure and can't drain. (70% fail)
- **** — GKE managed node pools don't allow custom kubelet configurations; this change is not supported and may be reverted. (90% fail)
