# 运行时错误：CUDA 错误：调用 cublasCreate_v2 时 CUBLAS_STATUS_ALLOC_FAILED

- **ID:** `cuda/cublas-alloc-failed-internal`
- **领域:** cuda
- **类别:** resource_error
- **错误码:** `CUBLAS_STATUS_ALLOC_FAILED`
- **验证级别:** ai_generated
- **修复率:** 78%

## 根因

cuBLAS 库无法分配内部内存，通常是由于 GPU 内存不足或 CUDA 上下文已损坏或耗尽。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| CUDA 11.8 | active | — | — |
| CUDA 12.0 | active | — | — |
| CUDA 12.1 | active | — | — |

## 解决方案

1. ```
   Reduce GPU memory usage by decreasing batch size or using gradient accumulation. For example, set batch_size=8 and accumulate gradients over 4 steps: optimizer.zero_grad(); loss.backward(); if (step+1) % 4 == 0: optimizer.step(). Also clear cache with torch.cuda.empty_cache() after each epoch.
   ```
2. ```
   Restart the Python process and ensure no other processes are using the GPU. Use 'nvidia-smi' to check memory usage and kill competing processes with 'kill -9 <PID>'. Then re-run the code.
   ```

## 无效尝试

- **** — Increasing batch size in the model makes the problem worse by consuming more GPU memory, not less. (90% 失败率)
- **** — Setting torch.backends.cudnn.enabled = False disables cuDNN but cuBLAS is still used internally; this doesn't free the memory needed by cuBLAS. (70% 失败率)