503 cloud runtime_error ai_generated true

启动探测失败:HTTP探测返回状态码:503

Startup probe failed: HTTP probe failed with statuscode: 503

ID: cloud/gcp-cloud-run-container-startup-probe-failure

其他格式: JSON · Markdown 中文 · English
90%修复率
86%置信度
1证据数
2024-02-28首次发现

版本兼容性

版本状态引入弃用备注
Cloud Run (managed) 2024 active
Knative Serving 1.12 active
gcloud CLI 462 active

根因分析

Cloud Run的启动探测端点在初始启动期内返回非2xx状态(503),通常是因为应用程序初始化时间超过了探测的initialDelaySeconds或periodSeconds所允许的时间。

English

Cloud Run's startup probe endpoint is returning a non-2xx status (503) within the initial startup period, often because the application takes longer to initialize than the probe's initialDelaySeconds or periodSeconds allows.

generic

官方文档

https://cloud.google.com/run/docs/configuring/healthchecks#startup-probes

解决方案

  1. Increase startup probe initialDelaySeconds to match application startup time: `gcloud run services update my-service --startup-probe-initial-delay=60 --startup-probe-period=10 --startup-probe-failure-threshold=6`. Also ensure the /health endpoint returns 200 only after full initialization.
  2. Implement a health check endpoint that returns 503 until the application is ready, then 200. Example in Python Flask: `@app.route('/health') def health(): return ('OK', 200) if app_ready else ('Service Unavailable', 503)`. Set app_ready = True after initialization completes.

无效尝试

常见但无效的做法:

  1. 60% 失败

    Resource increase may not reduce startup time if the application has a fixed initialization delay (e.g., loading ML models).

  2. 80% 失败

    Cloud Run requires a startup probe for long-running services; removing it may cause the container to be killed before it finishes initializing.

  3. 75% 失败

    Too short a period causes rapid retries that may overwhelm the application during startup, worsening the issue.