# RuntimeError: CUDA error: device-side assert triggered. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions

- **ID:** `pytorch/cuda-assert-triggered`
- **Domain:** pytorch
- **Category:** assertion_error
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

A CUDA kernel performed an illegal operation (e.g., out-of-bounds index, NaN in loss) that triggered a device-side assertion, but detailed info is suppressed without DSA build.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| torch 1.13.1 | active | — | — |
| torch 2.0.0 | active | — | — |
| cuda 11.7 | active | — | — |
| cuda 12.0 | active | — | — |

## Workarounds

1. **Rebuild PyTorch from source with TORCH_USE_CUDA_DSA=1 to get detailed error messages:
export TORCH_USE_CUDA_DSA=1
pip install --no-cache-dir --verbose torch --no-binary torch
Then rerun and check the exact line causing the assertion.** (90% success)
   ```
   Rebuild PyTorch from source with TORCH_USE_CUDA_DSA=1 to get detailed error messages:
export TORCH_USE_CUDA_DSA=1
pip install --no-cache-dir --verbose torch --no-binary torch
Then rerun and check the exact line causing the assertion.
   ```
2. **Add assertions in your code before CUDA operations, e.g., check index bounds:
assert (indices >= 0).all() and (indices < tensor.size(0)).all(), "Index out of bounds"
Also check for NaN/Inf in loss: assert not torch.isnan(loss).any()** (85% success)
   ```
   Add assertions in your code before CUDA operations, e.g., check index bounds:
assert (indices >= 0).all() and (indices < tensor.size(0)).all(), "Index out of bounds"
Also check for NaN/Inf in loss: assert not torch.isnan(loss).any()
   ```

## Dead Ends

- **** — Simply catching the exception and retrying may mask the root cause (e.g., invalid index) and cause silent data corruption. (95% fail)
- **** — Increasing batch size or changing learning rate does not fix illegal memory access or index errors. (90% fail)
- **** — Disabling CUDA and falling back to CPU may work but is not a real fix and may be impractically slow. (80% fail)
