# CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate_v2

- **ID:** `cuda/cublas-api-error-on-shutdown`
- **Domain:** cuda
- **Category:** runtime_error
- **Error Code:** `CUBLAS_STATUS_ALLOC_FAILED`
- **Verification:** ai_generated
- **Fix Rate:** 75%

## Root Cause

cuBLAS handle allocation fails due to insufficient GPU memory or driver state corruption, often triggered during rapid context creation/destruction or after a previous CUDA error left the device in an inconsistent state.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| CUDA 11.8 | active | — | — |
| CUDA 12.1 | active | — | — |
| cuBLAS 11.11 | active | — | — |
| cuBLAS 12.0 | active | — | — |

## Workarounds

1. **Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.** (70% success)
   ```
   Reset the CUDA device by calling `torch.cuda.reset_peak_memory_stats()` and `torch.cuda.empty_cache()` before creating new cuBLAS handles. Then reinitialize the model in a fresh context.
   ```
2. **Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.** (90% success)
   ```
   Kill all processes using the GPU with `nvidia-smi` and restart the application. For persistent issues, reboot the machine to fully reset the GPU driver state.
   ```

## Dead Ends

- **** — The previous CUDA context may still be alive, and residual allocations prevent new handle creation; a full GPU reset or process kill is needed. (80% fail)
- **** — The error is not about insufficient memory for tensors but about handle allocation; larger batch sizes exacerbate memory pressure. (90% fail)
- **** — The issue is often runtime state corruption, not a missing library; driver version mismatch can cause other errors, but this specific error persists. (70% fail)
