CUBLAS_STATUS_ARCH_MISMATCH
cuda
runtime_error
ai_generated
true
RuntimeError: CUDA error: CUBLAS_STATUS_ARCH_MISMATCH when calling cublasSgemm
ID: cuda/cublas-api-not-found
82%Fix Rate
85%Confidence
1Evidence
2023-05-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 11.8 | active | — | — | — |
| cuBLAS 11.11 | active | — | — | — |
| PyTorch 2.0.1 | active | — | — | — |
| NVIDIA Driver 525.85.05 | active | — | — | — |
Root Cause
The GPU's compute capability is too low for the cuBLAS kernel being invoked, typically because the code was compiled for sm_80+ but the GPU only supports sm_70 or earlier.
generic中文
GPU 的计算能力低于所调用 cuBLAS 内核的要求,通常是因为代码针对 sm_80+ 编译,但 GPU 仅支持 sm_70 或更早版本。
Official Documentation
https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-tWorkarounds
-
70% success export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
-
85% success export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
中文步骤
export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
Dead Ends
Common approaches that don't work:
-
90% fail
Reinstallation does not change the GPU hardware or the compiled architecture targets; the mismatch persists.
-
85% fail
Driver updates do not alter cuBLAS library architecture requirements; the kernel still expects a higher compute capability.