CUBLAS_STATUS_ALLOC_FAILED cuda runtime_error ai_generated partial

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2

ID: cuda/cublas-api-error-on-shutdown

Also available as: JSON · Markdown · 中文

75%Fix Rate

85%Confidence

1Evidence

2023-03-15First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
CUDA 11.8	active	—	—	—
CUDA 12.1	active	—	—	—
cuBLAS 11.11	active	—	—	—
cuBLAS 12.0	active	—	—	—

Root Cause

cuBLAS handle allocation fails due to insufficient GPU memory or driver state corruption, often triggered during rapid context creation/destruction or after a previous CUDA error left the device in an inconsistent state.

generic

中文

cuBLAS 句柄分配失败，通常是由于 GPU 内存不足或驱动程序状态损坏，在快速创建/销毁上下文或在之前的 CUDA 错误使设备处于不一致状态后触发。

Official Documentation

https://docs.nvidia.com/cuda/cublas/index.html#cublascreate

Workarounds

70% success Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
```
Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
```
90% success Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.
```
Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.
```

中文步骤

Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.

Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.

Dead Ends

Common approaches that don't work:

80% fail
The previous CUDA context may still be alive, and residual allocations prevent new handle creation; a full GPU reset or process kill is needed.
90% fail
The error is not about insufficient memory for tensors but about handle allocation; larger batch sizes exacerbate memory pressure.
70% fail
The issue is often runtime state corruption, not a missing library; driver version mismatch can cause other errors, but this specific error persists.