# 启动探测失败：HTTP探测返回状态码：503

- **ID:** `cloud/gcp-cloud-run-container-startup-probe-failure`
- **领域:** cloud
- **类别:** runtime_error
- **错误码:** `503`
- **验证级别:** ai_generated
- **修复率:** 90%

## 根因

Cloud Run的启动探测端点在初始启动期内返回非2xx状态（503），通常是因为应用程序初始化时间超过了探测的initialDelaySeconds或periodSeconds所允许的时间。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| Cloud Run (managed) 2024 | active | — | — |
| Knative Serving 1.12 | active | — | — |
| gcloud CLI 462 | active | — | — |

## 解决方案

1. ```
   Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
   ```
2. ```
   Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.
   ```

## 无效尝试

- **** — Resource increase may not reduce startup time if the application has a fixed initialization delay (e.g., loading ML models). (60% 失败率)
- **** — Cloud Run requires a startup probe for long-running services; removing it may cause the container to be killed before it finishes initializing. (80% 失败率)
- **** — Too short a period causes rapid retries that may overwhelm the application during startup, worsening the issue. (75% 失败率)
