# RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasLtMatmulAlgoGetHeuristic

- **ID:** `cuda/cublas-alloc-failed-cublaslt`
- **Domain:** cuda
- **Category:** resource_error
- **Error Code:** `CUBLAS_STATUS_ALLOC_FAILED`
- **Verification:** ai_generated
- **Fix Rate:** 78%

## Root Cause

cuBLASLt heuristic search for matrix multiplication algorithms fails due to insufficient GPU memory, often caused by memory fragmentation or large workspace requirements.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| CUDA 12.0 | active | — | — |
| CUDA 12.3 | active | — | — |
| cuBLASLt 0.8 | active | — | — |
| PyTorch 2.2 | active | — | — |

## Workarounds

1. **Reduce memory usage by lowering batch size or using gradient checkpointing. For example, in PyTorch: model = torch.utils.checkpoint.checkpoint(model, *inputs). This frees memory for the heuristic allocation.** (80% success)
   ```
   Reduce memory usage by lowering batch size or using gradient checkpointing. For example, in PyTorch: model = torch.utils.checkpoint.checkpoint(model, *inputs). This frees memory for the heuristic allocation.
   ```
2. **Clear GPU cache before the operation: torch.cuda.empty_cache(). This can defragment memory and free up contiguous blocks needed for cuBLASLt workspace.** (70% success)
   ```
   Clear GPU cache before the operation: torch.cuda.empty_cache(). This can defragment memory and free up contiguous blocks needed for cuBLASLt workspace.
   ```
3. **Restrict the number of algorithms searched by setting the environment variable: CUBLASLT_HEURISTIC_MODE=1. This reduces workspace allocation size during the heuristic search.** (75% success)
   ```
   Restrict the number of algorithms searched by setting the environment variable: CUBLASLT_HEURISTIC_MODE=1. This reduces workspace allocation size during the heuristic search.
   ```

## Dead Ends

- **** — Larger batch sizes increase memory usage, exacerbating the allocation failure. The error occurs due to insufficient memory for workspace, not underutilization. (95% fail)
- **** — The workspace config controls the internal buffer size but doesn't directly fix allocation failures during heuristic search; it may even increase memory pressure. (80% fail)
- **** — cuBLASLt is often the default for certain operations; disabling it may fall back to cuBLAS but can cause performance degradation or different errors. (50% fail)
