Startup probe failed: HTTP probe failed with statuscode: 503
ID: cloud/gcp-cloud-run-container-startup-probe-failure
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| Cloud Run (managed) 2024 | active | — | — | — |
| Knative Serving 1.12 | active | — | — | — |
| gcloud CLI 462 | active | — | — | — |
Root Cause
Cloud Run's startup probe endpoint is returning a non-2xx status (503) within the initial startup period, often because the application takes longer to initialize than the probe's initialDelaySeconds or periodSeconds allows.
generic中文
Cloud Run的启动探测端点在初始启动期内返回非2xx状态(503),通常是因为应用程序初始化时间超过了探测的initialDelaySeconds或periodSeconds所允许的时间。
Official Documentation
https://cloud.google.com/run/docs/configuring/healthchecks#startup-probesWorkarounds
-
90% success Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
-
95% success Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.
Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.
中文步骤
Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.
Dead Ends
Common approaches that don't work:
-
60% fail
Resource increase may not reduce startup time if the application has a fixed initialization delay (e.g., loading ML models).
-
80% fail
Cloud Run requires a startup probe for long-running services; removing it may cause the container to be killed before it finishes initializing.
-
75% fail
Too short a period causes rapid retries that may overwhelm the application during startup, worsening the issue.