cudaErrorMpsClientFailed (803) cuda runtime_error ai_generated true

CUDA 错误:MPS 客户端连接失败 (cudaErrorMpsClientFailed)

CUDA error: MPS client failed to connect (cudaErrorMpsClientFailed)

ID: cuda/cuda-error-mps-client-failed

其他格式: JSON · Markdown 中文 · English
82%修复率
88%置信度
1证据数
2024-06-10首次发现

版本兼容性

版本状态引入弃用备注
CUDA 12.0 active
CUDA 12.1 active
CUDA 12.3 active
NVIDIA Driver 535.129.03 active
NVIDIA Driver 545.23.06 active

根因分析

CUDA 多进程服务 (MPS) 控制守护进程未运行或已崩溃,导致新的 MPS 客户端无法连接到共享 GPU 上下文。

English

The CUDA Multi-Process Service (MPS) control daemon is not running or has crashed, preventing a new MPS client from connecting to the shared GPU context.

generic

官方文档

https://docs.nvidia.com/deploy/mps/index.html

解决方案

  1. Start the MPS control daemon before launching the application: `export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps; nvidia-cuda-mps-control -d`
  2. Disable MPS by unsetting CUDA_MPS_PIPE_DIRECTORY and restarting the process: `unset CUDA_MPS_PIPE_DIRECTORY`
  3. Check if MPS daemon is running and restart it: `ps aux | grep nvidia-cuda-mps-control; killall nvidia-cuda-mps-control; nvidia-cuda-mps-control -d`

无效尝试

常见但无效的做法:

  1. 80% 失败

    The error is not caused by missing or corrupt CUDA installations, but by a missing or unresponsive MPS daemon process.

  2. 70% 失败

    The environment variable only changes the socket path; if the daemon is not running at that path, the connection still fails.