# 运行时错误：CUDA 错误：遇到非法内存访问

- **ID:** `pytorch/cuda-error-illegal-memory-access`
- **领域:** pytorch
- **类别:** runtime_error
- **错误码:** `CUDA_ERROR_ILLEGAL_ADDRESS`
- **验证级别:** ai_generated
- **修复率:** 75%

## 根因

内核尝试读取或写入其分配区域之外的内存，通常由张量越界索引或指针损坏引起。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| pytorch>=1.10 | active | — | — |
| cuda>=11.0 | active | — | — |
| cudnn>=8.0 | active | — | — |

## 解决方案

1. ```
   Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.
   ```
2. ```
   Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`
   ```
3. ```
   Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.
   ```

## 无效尝试

- **Increasing GPU memory or adding more GPUs** — The error is not about memory capacity but invalid access; more memory doesn't fix invalid pointers. (90% 失败率)
- **Rebooting the machine or resetting CUDA context** — The root cause is in the code logic; a reboot may temporarily mask the issue but it reoccurs. (70% 失败率)
- **Switching to CPU mode entirely** — Avoids the error but defeats the purpose of using GPU acceleration. (50% 失败率)
