CUBLAS_STATUS_ALLOC_FAILED
cuda
resource_error
ai_generated
true
运行时错误:CUDA 错误:调用 cublasCreate_v2 时 CUBLAS_STATUS_ALLOC_FAILED
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2
ID: cuda/cublas-alloc-failed-internal
78%修复率
82%置信度
1证据数
2023-05-20首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| CUDA 11.8 | active | — | — | — |
| CUDA 12.0 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
根因分析
cuBLAS 库无法分配内部内存,通常是由于 GPU 内存不足或 CUDA 上下文已损坏或耗尽。
English
cuBLAS library failed to allocate internal memory, typically due to insufficient GPU memory or a CUDA context that is corrupted or exhausted.
官方文档
https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-t解决方案
-
Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
-
Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.
无效尝试
常见但无效的做法:
-
90% 失败
Increasing batch size in the model makes the problem worse by consuming more GPU memory, not less.
-
70% 失败
Setting torch.backends.cudnn.enabled = False disables cuDNN but cuBLAS is still used internally; this doesn't free the memory needed by cuBLAS.