503 cloud runtime_error ai_generated true

Startup probe failed: HTTP probe failed with statuscode: 503

ID: cloud/gcp-cloud-run-container-startup-probe-failure

Also available as: JSON · Markdown · 中文
90%Fix Rate
86%Confidence
1Evidence
2024-02-28First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Cloud Run (managed) 2024 active
Knative Serving 1.12 active
gcloud CLI 462 active

Root Cause

Cloud Run's startup probe endpoint is returning a non-2xx status (503) within the initial startup period, often because the application takes longer to initialize than the probe's initialDelaySeconds or periodSeconds allows.

generic

中文

Cloud Run的启动探测端点在初始启动期内返回非2xx状态(503),通常是因为应用程序初始化时间超过了探测的initialDelaySeconds或periodSeconds所允许的时间。

Official Documentation

https://cloud.google.com/run/docs/configuring/healthchecks#startup-probes

Workarounds

  1. 90% success Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
    Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
  2. 95% success Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.
    Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.

中文步骤

  1. Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
  2. Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.

Dead Ends

Common approaches that don't work:

  1. 60% fail

    Resource increase may not reduce startup time if the application has a fixed initialization delay (e.g., loading ML models).

  2. 80% fail

    Cloud Run requires a startup probe for long-running services; removing it may cause the container to be killed before it finishes initializing.

  3. 75% fail

    Too short a period causes rapid retries that may overwhelm the application during startup, worsening the issue.