# Job 已达到指定的回退限制

- **ID:** `kubernetes/job-backoff-limit-exceeded`
- **领域:** kubernetes
- **类别:** runtime_error
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

Kubernetes Job 的 Pod 失败次数超过了 backoffLimit 允许的值，导致 Job 停止重试。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| Kubernetes v1.26.0 | active | — | — |
| Kubernetes v1.28.0 | active | — | — |
| Kubernetes v1.30.0 | active | — | — |

## 解决方案

1. ```
   检查失败 Pod 的日志：`kubectl logs job/my-job --previous` 查看上次尝试的错误。
   ```
2. ```
   修复容器命令或镜像，然后删除并重新创建 Job：`kubectl delete job my-job && kubectl create job my-job --image=correct-image -- /correct-command`。
   ```
3. ```
   如果失败是瞬时的，增加 backoffLimit 并添加 restartPolicy：在 Job spec 中设置 `backoffLimit: 10` 和 `restartPolicy: OnFailure`。
   ```

## 无效尝试

- **Increasing backoffLimit to a very high number without fixing the underlying pod failure** — The Job will still fail after exhausting the new limit; the root cause in the container remains. (70% 失败率)
- **Deleting and recreating the Job with the same spec** — The same pod failures will repeat because the container image or command is still broken. (90% 失败率)
