CUDA_ERROR_ILLEGAL_ADDRESS
pytorch
runtime_error
ai_generated
partial
RuntimeError: CUDA error: an illegal memory access was encountered
ID: pytorch/cuda-error-illegal-memory-access
75%Fix Rate
85%Confidence
1Evidence
2023-03-15First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| pytorch>=1.10 | active | — | — | — |
| cuda>=11.0 | active | — | — | — |
| cudnn>=8.0 | active | — | — | — |
Root Cause
A kernel attempted to read or write memory outside its allocated region, often caused by out-of-bounds tensor indexing or corrupted pointers.
generic中文
内核尝试读取或写入其分配区域之外的内存,通常由张量越界索引或指针损坏引起。
Official Documentation
https://pytorch.org/docs/stable/notes/cuda.html#cuda-error-handlingWorkarounds
-
80% success Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.
Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.
-
75% success Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`
Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`
-
70% success Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.
Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.
中文步骤
Enable CUDA synchronous debugging to pinpoint the exact line: set environment variable CUDA_LAUNCH_BLOCKING=1 before running the script. Then run the script and check the traceback.
Replace all dynamic indexing with torch.clamp or torch.where to ensure indices stay within bounds. For example: `idx = torch.clamp(idx, 0, tensor.size(0)-1)`
Use torch.cuda.synchronize() after suspicious operations to force synchronization and catch the error earlier.
Dead Ends
Common approaches that don't work:
-
Increasing GPU memory or adding more GPUs
90% fail
The error is not about memory capacity but invalid access; more memory doesn't fix invalid pointers.
-
Rebooting the machine or resetting CUDA context
70% fail
The root cause is in the code logic; a reboot may temporarily mask the issue but it reoccurs.
-
Switching to CPU mode entirely
50% fail
Avoids the error but defeats the purpose of using GPU acceleration.