# RuntimeError: CUDA error: an illegal memory access was encountered

- **ID:** `pytorch/cuda-error-illegal-memory-access`
- **Domain:** pytorch
- **Category:** runtime_error
- **Error Code:** `CUDA_ERROR_ILLEGAL_ADDRESS`
- **Verification:** ai_generated
- **Fix Rate:** 75%

## Root Cause

A kernel attempted to read or write memory outside its allocated region, often caused by out-of-bounds tensor indexing or corrupted pointers.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| pytorch>=1.10 | active | — | — |
| cuda>=11.0 | active | — | — |
| cudnn>=8.0 | active | — | — |

## Workarounds

1. **Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.** (80% success)
   ```
   Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.
   ```
2. **Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`** (75% success)
   ```
   Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`
   ```
3. **Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.** (70% success)
   ```
   Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.
   ```

## Dead Ends

- **Increasing GPU memory or adding more GPUs** — The error is not about memory capacity but invalid access; more memory doesn't fix invalid pointers. (90% fail)
- **Rebooting the machine or resetting CUDA context** — The root cause is in the code logic; a reboot may temporarily mask the issue but it reoccurs. (70% fail)
- **Switching to CPU mode entirely** — Avoids the error but defeats the purpose of using GPU acceleration. (50% fail)
