RESOURCE_EXHAUSTED cloud resource_error ai_generated true

节点池升级失败:资源耗尽:区域 us-central1-a 中 CPU 不足

Node pool upgrade failed: Resource exhausted: insufficient CPU available in zone us-central1-a

ID: cloud/gcp-gke-node-pool-upgrade-failed

其他格式: JSON · Markdown 中文 · English
82%修复率
86%置信度
1证据数
2024-09-05首次发现

版本兼容性

版本状态引入弃用备注
GKE: 1.28.5-gke.1500 active
Kubernetes: 1.28 active
Compute Engine: API v1 active

根因分析

GKE 在升级期间无法分配新节点,因为指定区域的 CPU 配额或容量不足,无法容纳滚动更新所需的额外临时节点。

English

GKE cannot allocate new nodes during upgrade because the specified zone has insufficient CPU quota or capacity to host the additional temporary nodes required for the rolling update.

generic

官方文档

https://cloud.google.com/kubernetes-engine/docs/how-to/upgrading-a-cluster

解决方案

  1. 在 GCP 控制台中为受影响区域的 Compute Engine CPU 请求增加配额:IAM 与管理 > 配额 > 'CPU' > 编辑配额。
  2. 使用不同区域的激增升级,在可用容量充足的区域添加节点池,然后迁移工作负载。
  3. 临时减少集群中的副本数以释放配额,然后执行升级。

无效尝试

常见但无效的做法:

  1. 85% 失败

    More nodes consume more quota, worsening the exhaustion; the upgrade needs additional quota for temporary nodes, not larger pool.

  2. 60% 失败

    Deletion frees quota but the new pool creation may still fail if zone capacity is insufficient at that time.

  3. 70% 失败

    Smaller instances may not meet workload requirements; also, the zone may still lack capacity for any instance type.