CUBLAS_STATUS_ALLOC_FAILED cuda resource_error ai_generated true

运行时错误：CUDA 错误：调用 cublasCreate_v2 时 CUBLAS_STATUS_ALLOC_FAILED

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2

ID: cuda/cublas-alloc-failed-internal

其他格式: JSON · Markdown 中文 · English

78%修复率

82%置信度

1证据数

2023-05-20首次发现

版本兼容性

版本	状态	引入	弃用	备注
CUDA 11.8	active	—	—	—
CUDA 12.0	active	—	—	—
CUDA 12.1	active	—	—	—

根因分析

cuBLAS 库无法分配内部内存，通常是由于 GPU 内存不足或 CUDA 上下文已损坏或耗尽。

English

cuBLAS library failed to allocate internal memory, typically due to insufficient GPU memory or a CUDA context that is corrupted or exhausted.

generic

官方文档

https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-t

解决方案

Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.

Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.

无效尝试

常见但无效的做法:

90% 失败
Increasing batch size in the model makes the problem worse by consuming more GPU memory, not less.
70% 失败
Setting torch.backends.cudnn.enabled = False disables cuDNN but cuBLAS is still used internally; this doesn't free the memory needed by cuBLAS.