cudaErrorMpsClientFailed (803)
cuda
runtime_error
ai_generated
true
CUDA error: MPS client failed to connect (cudaErrorMpsClientFailed)
ID: cuda/cuda-error-mps-client-failed
82%Fix Rate
88%Confidence
1Evidence
2024-06-10First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 12.0 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
| CUDA 12.3 | active | — | — | — |
| NVIDIA Driver 535.129.03 | active | — | — | — |
| NVIDIA Driver 545.23.06 | active | — | — | — |
Root Cause
The CUDA Multi-Process Service (MPS) control daemon is not running or has crashed, preventing a new MPS client from connecting to the shared GPU context.
generic中文
CUDA 多进程服务 (MPS) 控制守护进程未运行或已崩溃,导致新的 MPS 客户端无法连接到共享 GPU 上下文。
Official Documentation
https://docs.nvidia.com/deploy/mps/index.htmlWorkarounds
-
90% success Start the MPS control daemon before launching the application: `export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps; nvidia-cuda-mps-control -d`
Start the MPS control daemon before launching the application: `export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps; nvidia-cuda-mps-control -d`
-
95% success Disable MPS by unsetting CUDA_MPS_PIPE_DIRECTORY and restarting the process: `unset CUDA_MPS_PIPE_DIRECTORY`
Disable MPS by unsetting CUDA_MPS_PIPE_DIRECTORY and restarting the process: `unset CUDA_MPS_PIPE_DIRECTORY`
-
85% success Check if MPS daemon is running and restart it: `ps aux | grep nvidia-cuda-mps-control; killall nvidia-cuda-mps-control; nvidia-cuda-mps-control -d`
Check if MPS daemon is running and restart it: `ps aux | grep nvidia-cuda-mps-control; killall nvidia-cuda-mps-control; nvidia-cuda-mps-control -d`
中文步骤
Start the MPS control daemon before launching the application: `export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps; nvidia-cuda-mps-control -d`
Disable MPS by unsetting CUDA_MPS_PIPE_DIRECTORY and restarting the process: `unset CUDA_MPS_PIPE_DIRECTORY`
Check if MPS daemon is running and restart it: `ps aux | grep nvidia-cuda-mps-control; killall nvidia-cuda-mps-control; nvidia-cuda-mps-control -d`
Dead Ends
Common approaches that don't work:
-
80% fail
The error is not caused by missing or corrupt CUDA installations, but by a missing or unresponsive MPS daemon process.
-
70% fail
The environment variable only changes the socket path; if the daemon is not running at that path, the connection still fails.