CUDA 错误:调用 cublasCreate_v2 时 CUBLAS_STATUS_ALLOC_FAILED
CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2
ID: cuda/cublas-api-error-on-shutdown
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| CUDA 11.8 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
| cuBLAS 11.11 | active | — | — | — |
| cuBLAS 12.0 | active | — | — | — |
根因分析
cuBLAS 句柄分配失败,通常是由于 GPU 内存不足或驱动程序状态损坏,在快速创建/销毁上下文或在之前的 CUDA 错误使设备处于不一致状态后触发。
English
cuBLAS handle allocation fails due to insufficient GPU memory or driver state corruption, often triggered during rapid context creation/destruction or after a previous CUDA error left the device in an inconsistent state.
官方文档
https://docs.nvidia.com/cuda/cublas/index.html#cublascreate解决方案
-
Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
-
Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.
无效尝试
常见但无效的做法:
-
80% 失败
The previous CUDA context may still be alive, and residual allocations prevent new handle creation; a full GPU reset or process kill is needed.
-
90% 失败
The error is not about insufficient memory for tensors but about handle allocation; larger batch sizes exacerbate memory pressure.
-
70% 失败
The issue is often runtime state corruption, not a missing library; driver version mismatch can cause other errors, but this specific error persists.