kubernetes runtime_error ai_generated true

Job has reached the specified backoff limit

ID: kubernetes/job-backoff-limit-exceeded

Also available as: JSON · Markdown · 中文

85%Fix Rate

80%Confidence

1Evidence

2023-10-01First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
Kubernetes v1.26.0	active	—	—	—
Kubernetes v1.28.0	active	—	—	—
Kubernetes v1.30.0	active	—	—	—

A Kubernetes Job's pod has failed more times than the backoffLimit allows, causing the Job to stop retrying.

generic

Kubernetes Job 的 Pod 失败次数超过了 backoffLimit 允许的值，导致 Job 停止重试。

90% success Check the logs of the failed pod: `kubectl logs job/my-job --previous` to see the last attempt's error.
```
Check the logs of the failed pod: `kubectl logs job/my-job --previous` to see the last attempt's error.
```
85% success Fix the container command or image, then delete and recreate the Job: `kubectl delete job my-job && kubectl create job my-job --image=correct-image -- /correct-command`.
```
Fix the container command or image, then delete and recreate the Job: `kubectl delete job my-job && kubectl create job my-job --image=correct-image -- /correct-command`.
```
70% success If the failure is transient, increase backoffLimit and add a restartPolicy: set `backoffLimit: 10` and `restartPolicy: OnFailure` in the Job spec.
```
If the failure is transient, increase backoffLimit and add a restartPolicy: set `backoffLimit: 10` and `restartPolicy: OnFailure` in the Job spec.
```

检查失败 Pod 的日志：`kubectl logs job/my-job --previous` 查看上次尝试的错误。

修复容器命令或镜像，然后删除并重新创建 Job：`kubectl delete job my-job && kubectl create job my-job --image=correct-image -- /correct-command`。

如果失败是瞬时的，增加 backoffLimit 并添加 restartPolicy：在 Job spec 中设置 `backoffLimit: 10` 和 `restartPolicy: OnFailure`。

Common approaches that don't work:

Increasing backoffLimit to a very high number without fixing the underlying pod failure 70% fail
The Job will still fail after exhausting the new limit; the root cause in the container remains.
Deleting and recreating the Job with the same spec 90% fail
The same pod failures will repeat because the container image or command is still broken.