# CUDA 错误：MPS 客户端连接失败 (cudaErrorMpsClientFailed)

- **ID:** `cuda/cuda-error-mps-client-failed`
- **领域:** cuda
- **类别:** runtime_error
- **错误码:** `cudaErrorMpsClientFailed (803)`
- **验证级别:** ai_generated
- **修复率:** 82%

## 根因

CUDA 多进程服务 (MPS) 控制守护进程未运行或已崩溃，导致新的 MPS 客户端无法连接到共享 GPU 上下文。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| CUDA 12.0 | active | — | — |
| CUDA 12.1 | active | — | — |
| CUDA 12.3 | active | — | — |
| NVIDIA Driver 535.129.03 | active | — | — |
| NVIDIA Driver 545.23.06 | active | — | — |

## 解决方案

1. ```
   Start the MPS control daemon before launching the application: `export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps; nvidia-cuda-mps-control -d`
   ```
2. ```
   Disable MPS by unsetting CUDA_MPS_PIPE_DIRECTORY and restarting the process: `unset CUDA_MPS_PIPE_DIRECTORY`
   ```
3. ```
   Check if MPS daemon is running and restart it: `ps aux | grep nvidia-cuda-mps-control; killall nvidia-cuda-mps-control; nvidia-cuda-mps-control -d`
   ```

## 无效尝试

- **** — The error is not caused by missing or corrupt CUDA installations, but by a missing or unresponsive MPS daemon process. (80% 失败率)
- **** — The environment variable only changes the socket path; if the daemon is not running at that path, the connection still fails. (70% 失败率)
