# RuntimeError: CUDA 错误：触发了设备端断言。请使用 TORCH_USE_CUDA_DSA 编译以启用设备端断言。

- **ID:** `pytorch/cuda-error-devices-synchronize-abort`
- **领域:** pytorch
- **类别:** runtime_error
- **错误码:** `CUDA_ERROR_ILLEGAL_ADDRESS`
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

CUDA 内核在设备上遇到了断言失败（例如，嵌入层中的无效索引、负维度或损失中的 NaN），这通常会导致后续操作静默失败，然后此错误才会显现。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| PyTorch 2.0.0 | active | — | — |
| CUDA 11.7 | active | — | — |
| CUDA 11.8 | active | — | — |
| CUDA 12.1 | active | — | — |
| Ubuntu 22.04 | active | — | — |

## 解决方案

1. ```
   Enable device-side assertions by setting environment variable TORCH_USE_CUDA_DSA=1 before running the script, then re-run to get a detailed stack trace pointing to the failing operation (e.g., embedding lookup with out-of-range index). Example: TORCH_USE_CUDA_DSA=1 python train.py
   ```
2. ```
   Add gradient clipping and NaN checks in the training loop: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0); if torch.isnan(loss): print('NaN loss'); return
   ```
3. ```
   Wrap the problematic operation in a try-except block and use torch.cuda.synchronize() to catch the exact error location. For example: try: output = model(input); torch.cuda.synchronize(); except RuntimeError as e: print(f'Error at iteration {i}: {e}')
   ```

## 无效尝试

- **Set torch.backends.cudnn.deterministic = True** — Deterministic mode does not fix invalid tensor values or index errors; it only ensures reproducibility of operations. (95% 失败率)
- **Increase batch size to trigger error less often** — Larger batch sizes may hide the issue temporarily but do not address the root cause (e.g., out-of-range indices in embedding). The error will reappear on different data. (90% 失败率)
- **Set CUDA_LAUNCH_BLOCKING=1 environment variable** — While this helps identify the exact operation causing the error, it does not fix the underlying problem such as index errors or NaN values. (85% 失败率)
