CUBLAS_STATUS_ALLOC_FAILED
cuda
resource_error
ai_generated
true
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2
ID: cuda/cublas-alloc-failed-internal
78%Fix Rate
82%Confidence
1Evidence
2023-05-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 11.8 | active | — | — | — |
| CUDA 12.0 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
Root Cause
cuBLAS library failed to allocate internal memory, typically due to insufficient GPU memory or a CUDA context that is corrupted or exhausted.
generic中文
cuBLAS 库无法分配内部内存,通常是由于 GPU 内存不足或 CUDA 上下文已损坏或耗尽。
Official Documentation
https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-tWorkarounds
-
80% success Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
-
75% success Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.
Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.
中文步骤
Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.
Dead Ends
Common approaches that don't work:
-
90% fail
Increasing batch size in the model makes the problem worse by consuming more GPU memory, not less.
-
70% fail
Setting torch.backends.cudnn.enabled = False disables cuDNN but cuBLAS is still used internally; this doesn't free the memory needed by cuBLAS.