CUBLAS_STATUS_ALLOC_FAILED cuda runtime_error ai_generated partial

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2

ID: cuda/cublas-api-error-on-shutdown

Also available as: JSON · Markdown · 中文
75%Fix Rate
85%Confidence
1Evidence
2023-03-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
CUDA 11.8 active
CUDA 12.1 active
cuBLAS 11.11 active
cuBLAS 12.0 active

Root Cause

cuBLAS handle allocation fails due to insufficient GPU memory or driver state corruption, often triggered during rapid context creation/destruction or after a previous CUDA error left the device in an inconsistent state.

generic

中文

cuBLAS 句柄分配失败,通常是由于 GPU 内存不足或驱动程序状态损坏,在快速创建/销毁上下文或在之前的 CUDA 错误使设备处于不一致状态后触发。

Official Documentation

https://docs.nvidia.com/cuda/cublas/index.html#cublascreate

Workarounds

  1. 70% success Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
    Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
  2. 90% success Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.
    Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.

中文步骤

  1. Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
  2. Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.

Dead Ends

Common approaches that don't work:

  1. 80% fail

    The previous CUDA context may still be alive, and residual allocations prevent new handle creation; a full GPU reset or process kill is needed.

  2. 90% fail

    The error is not about insufficient memory for tensors but about handle allocation; larger batch sizes exacerbate memory pressure.

  3. 70% fail

    The issue is often runtime state corruption, not a missing library; driver version mismatch can cause other errors, but this specific error persists.