UNAVAILABLE communication network_error ai_generated partial

grpc::UNAVAILABLE: No healthy upstream endpoints

ID: communication/grpc-unavailable-no-healthy-upstream

Also available as: JSON · Markdown · 中文
82%Fix Rate
88%Confidence
1Evidence
2023-06-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
gRPC 1.48 active
Envoy 1.26 active
Kubernetes 1.28 active
Istio 1.18 active

Root Cause

gRPC client fails to connect because the load balancer or service registry reports zero healthy backends for the target service.

generic

中文

gRPC 客户端无法连接,因为负载均衡器或服务注册中心报告目标服务没有健康的后端实例。

Official Documentation

https://grpc.io/docs/guides/error-handling/

Workarounds

  1. 75% success Verify backend health via `kubectl get endpoints -n <namespace> <service-name>` or equivalent service registry query. Then restart unhealthy pods: `kubectl rollout restart deployment/<deployment-name> -n <namespace>`.
    Verify backend health via `kubectl get endpoints -n <namespace> <service-name>` or equivalent service registry query. Then restart unhealthy pods: `kubectl rollout restart deployment/<deployment-name> -n <namespace>`.
  2. 70% success Add a retry with backoff in the gRPC client using a middleware like `grpc_retry` in Go: `import "github.com/grpc-ecosystem/go-grpc-middleware/retry"; opts := []grpc_retry.CallOption{grpc_retry.WithMax(3), grpc_retry.WithBackoff(grpc_retry.BackoffLinear(100 * time.Millisecond))}`
    Add a retry with backoff in the gRPC client using a middleware like `grpc_retry` in Go: `import "github.com/grpc-ecosystem/go-grpc-middleware/retry"; opts := []grpc_retry.CallOption{grpc_retry.WithMax(3), grpc_retry.WithBackoff(grpc_retry.BackoffLinear(100 * time.Millisecond))}`
  3. 65% success Increase the readiness probe threshold in the Kubernetes deployment spec: `readinessProbe.periodSeconds: 10` and `failureThreshold: 5` to allow slower-starting backends more time to become healthy.
    Increase the readiness probe threshold in the Kubernetes deployment spec: `readinessProbe.periodSeconds: 10` and `failureThreshold: 5` to allow slower-starting backends more time to become healthy.

中文步骤

  1. Verify backend health via `kubectl get endpoints -n <namespace> <service-name>` or equivalent service registry query. Then restart unhealthy pods: `kubectl rollout restart deployment/<deployment-name> -n <namespace>`.
  2. Add a retry with backoff in the gRPC client using a middleware like `grpc_retry` in Go: `import "github.com/grpc-ecosystem/go-grpc-middleware/retry"; opts := []grpc_retry.CallOption{grpc_retry.WithMax(3), grpc_retry.WithBackoff(grpc_retry.BackoffLinear(100 * time.Millisecond))}`
  3. Increase the readiness probe threshold in the Kubernetes deployment spec: `readinessProbe.periodSeconds: 10` and `failureThreshold: 5` to allow slower-starting backends more time to become healthy.

Dead Ends

Common approaches that don't work:

  1. Restart the client application to force a new connection 95% fail

    Restarting the gRPC client does not fix the root cause of unhealthy backends; the client will re-encounter the same error until the backend pool recovers.

  2. Disable TLS/SSL on the gRPC channel 85% fail

    Disabling TLS removes encryption but does not address backend health; the error stems from upstream unavailability, not protocol negotiation.

  3. Change the target port to 443 or another arbitrary number 90% fail

    Changing to a random port bypasses the correct service endpoint, making the situation worse by connecting to a non-existent service.