CUBLAS_STATUS_ALLOC_FAILED cuda resource_error ai_generated true

运行时错误:CUDA 错误:调用 cublasCreate_v2 时 CUBLAS_STATUS_ALLOC_FAILED

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2

ID: cuda/cublas-alloc-failed-internal

其他格式: JSON · Markdown 中文 · English
78%修复率
82%置信度
1证据数
2023-05-20首次发现

版本兼容性

版本状态引入弃用备注
CUDA 11.8 active
CUDA 12.0 active
CUDA 12.1 active

根因分析

cuBLAS 库无法分配内部内存,通常是由于 GPU 内存不足或 CUDA 上下文已损坏或耗尽。

English

cuBLAS library failed to allocate internal memory, typically due to insufficient GPU memory or a CUDA context that is corrupted or exhausted.

generic

官方文档

https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-t

解决方案

  1. Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
  2. Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.

无效尝试

常见但无效的做法:

  1. 90% 失败

    Increasing batch size in the model makes the problem worse by consuming more GPU memory, not less.

  2. 70% 失败

    Setting torch.backends.cudnn.enabled = False disables cuDNN but cuBLAS is still used internally; this doesn't free the memory needed by cuBLAS.