cudaErrorPeerAccessUnsupported cuda runtime_error ai_generated partial

RuntimeError: CUDA error: peer access is not supported between these two devices (cudaErrorPeerAccessUnsupported)

ID: cuda/peer-access-unsupported-by-hardware

Also available as: JSON · Markdown · 中文

70%Fix Rate

84%Confidence

1Evidence

2023-06-10First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
CUDA 11.0	active	—	—	—
CUDA 12.0	active	—	—	—
CUDA 12.3	active	—	—	—

Root Cause

The two GPUs do not support direct peer-to-peer (P2P) memory access, typically due to hardware topology (e.g., different PCIe switches) or disabled P2P in the driver.

generic

中文

两个 GPU 不支持直接对等 (P2P) 内存访问，通常是由于硬件拓扑（例如不同的 PCIe 交换机）或驱动程序中禁用了 P2P。

Official Documentation

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__PEER.html

Workarounds

75% success Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
```
Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
```
70% success If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
```
If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
```

中文步骤

Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.

If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.

Dead Ends

Common approaches that don't work:

90% fail
Enabling P2P via software flags cannot override hardware limitations; it will still fail.
80% fail
Rebooting the system does not change GPU topology; if P2P is unsupported by hardware, it remains unsupported.