cudaErrorPeerAccessUnsupported
cuda
runtime_error
ai_generated
partial
RuntimeError: CUDA error: peer access is not supported between these two devices (cudaErrorPeerAccessUnsupported)
ID: cuda/peer-access-unsupported-by-hardware
70%Fix Rate
84%Confidence
1Evidence
2023-06-10First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 11.0 | active | — | — | — |
| CUDA 12.0 | active | — | — | — |
| CUDA 12.3 | active | — | — | — |
Root Cause
The two GPUs do not support direct peer-to-peer (P2P) memory access, typically due to hardware topology (e.g., different PCIe switches) or disabled P2P in the driver.
generic中文
两个 GPU 不支持直接对等 (P2P) 内存访问,通常是由于硬件拓扑(例如不同的 PCIe 交换机)或驱动程序中禁用了 P2P。
Official Documentation
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__PEER.htmlWorkarounds
-
75% success Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
-
70% success If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
中文步骤
Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
Dead Ends
Common approaches that don't work:
-
90% fail
Enabling P2P via software flags cannot override hardware limitations; it will still fail.
-
80% fail
Rebooting the system does not change GPU topology; if P2P is unsupported by hardware, it remains unsupported.