kubernetes
runtime_error
ai_generated
true
Job 已达到指定的回退限制
Job has reached the specified backoff limit
ID: kubernetes/job-backoff-limit-exceeded
85%修复率
80%置信度
1证据数
2023-10-01首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| Kubernetes v1.26.0 | active | — | — | — |
| Kubernetes v1.28.0 | active | — | — | — |
| Kubernetes v1.30.0 | active | — | — | — |
根因分析
Kubernetes Job 的 Pod 失败次数超过了 backoffLimit 允许的值,导致 Job 停止重试。
English
A Kubernetes Job's pod has failed more times than the backoffLimit allows, causing the Job to stop retrying.
官方文档
https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy解决方案
-
检查失败 Pod 的日志:`kubectl logs job/my-job --previous` 查看上次尝试的错误。
-
修复容器命令或镜像,然后删除并重新创建 Job:`kubectl delete job my-job && kubectl create job my-job --image=correct-image -- /correct-command`。
-
如果失败是瞬时的,增加 backoffLimit 并添加 restartPolicy:在 Job spec 中设置 `backoffLimit: 10` 和 `restartPolicy: OnFailure`。
无效尝试
常见但无效的做法:
-
Increasing backoffLimit to a very high number without fixing the underlying pod failure
70% 失败
The Job will still fail after exhausting the new limit; the root cause in the container remains.
-
Deleting and recreating the Job with the same spec
90% 失败
The same pod failures will repeat because the container image or command is still broken.