cudaErrorIllegalAddress cuda runtime_error ai_generated true

RuntimeError: CUDA error: an illegal memory access was encountered after a cudaFree call on a tensor still in use

ID: cuda/illegal-memory-access-after-free

Also available as: JSON · Markdown · 中文
79%Fix Rate
82%Confidence
1Evidence
2025-01-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
CUDA 12.2 active
PyTorch 2.2.0 active
NVIDIA Driver 550.54.14 active

Root Cause

A tensor or buffer was freed via cudaFree or torch.cuda.empty_cache while a kernel or asynchronous operation still holds a reference, leading to a use-after-free on the GPU.

generic

中文

张量或缓冲区通过cudaFree或torch.cuda.empty_cache被释放,而内核或异步操作仍持有引用,导致GPU上的释放后使用。

Official Documentation

https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html

Workarounds

  1. 85% success Ensure all CUDA streams are synchronized before freeing tensors. Example: torch.cuda.synchronize() before calling del tensor or torch.cuda.empty_cache(). For custom kernels, use cudaStreamSynchronize on the relevant stream.
    Ensure all CUDA streams are synchronized before freeing tensors. Example: torch.cuda.synchronize() before calling del tensor or torch.cuda.empty_cache(). For custom kernels, use cudaStreamSynchronize on the relevant stream.
  2. 82% success Use reference counting or weak references to track tensor lifetimes. In PyTorch, keep a strong reference to the tensor until the kernel completes, e.g., by storing it in a list until the next iteration.
    Use reference counting or weak references to track tensor lifetimes. In PyTorch, keep a strong reference to the tensor until the kernel completes, e.g., by storing it in a list until the next iteration.

中文步骤

  1. Ensure all CUDA streams are synchronized before freeing tensors. Example: torch.cuda.synchronize() before calling del tensor or torch.cuda.empty_cache(). For custom kernels, use cudaStreamSynchronize on the relevant stream.
  2. Use reference counting or weak references to track tensor lifetimes. In PyTorch, keep a strong reference to the tensor until the kernel completes, e.g., by storing it in a list until the next iteration.

Dead Ends

Common approaches that don't work:

  1. 70% fail

    Synchronization may hide the bug but does not fix the root cause; the free still happens before all uses complete.

  2. 95% fail

    Memory size is unrelated; the error is about lifetime management, not capacity.