CUBLAS_STATUS_ARCH_MISMATCH cuda runtime_error ai_generated true

RuntimeError: CUDA error: CUBLAS_STATUS_ARCH_MISMATCH when calling cublasSgemm

ID: cuda/cublas-api-not-found

Also available as: JSON · Markdown · 中文

82%Fix Rate

85%Confidence

1Evidence

2023-05-12First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
CUDA 11.8	active	—	—	—
cuBLAS 11.11	active	—	—	—
PyTorch 2.0.1	active	—	—	—
NVIDIA Driver 525.85.05	active	—	—	—

Root Cause

The GPU's compute capability is too low for the cuBLAS kernel being invoked, typically because the code was compiled for sm_80+ but the GPU only supports sm_70 or earlier.

generic

中文

GPU 的计算能力低于所调用 cuBLAS 内核的要求，通常是因为代码针对 sm_80+ 编译，但 GPU 仅支持 sm_70 或更早版本。

Official Documentation

https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-t

Workarounds

70% success export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
```
export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
```
85% success export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
```
export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
```

中文步骤

export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py

export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose

Dead Ends

Common approaches that don't work:

90% fail
Reinstallation does not change the GPU hardware or the compiled architecture targets; the mismatch persists.
85% fail
Driver updates do not alter cuBLAS library architecture requirements; the kernel still expects a higher compute capability.