# 运行时错误：CUDA 错误：这两个设备之间不支持对等访问 (cudaErrorPeerAccessUnsupported)

- **ID:** `cuda/peer-access-unsupported-by-hardware`
- **领域:** cuda
- **类别:** runtime_error
- **错误码:** `cudaErrorPeerAccessUnsupported`
- **验证级别:** ai_generated
- **修复率:** 70%

## 根因

两个 GPU 不支持直接对等 (P2P) 内存访问，通常是由于硬件拓扑（例如不同的 PCIe 交换机）或驱动程序中禁用了 P2P。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| CUDA 11.0 | active | — | — |
| CUDA 12.0 | active | — | — |
| CUDA 12.3 | active | — | — |

## 解决方案

1. ```
   Disable peer-to-peer access in your code by setting the environment variable NCCL_P2P_DISABLE=1 before launching the script. For PyTorch DistributedDataParallel, use: os.environ['NCCL_P2P_DISABLE'] = '1'. This forces NCCL to use shared memory or network-based communication instead.
   ```
2. ```
   If using multiple GPUs, assign each GPU to a separate process (e.g., via torch.multiprocessing) to avoid P2P requirements. For example, use torch.cuda.set_device(rank) and communicate via torch.distributed with NCCL_SHM_DISABLE=1.
   ```

## 无效尝试

- **** — Enabling P2P via software flags cannot override hardware limitations; it will still fail. (90% 失败率)
- **** — Rebooting the system does not change GPU topology; if P2P is unsupported by hardware, it remains unsupported. (80% 失败率)
