CUBLAS_STATUS_ARCH_MISMATCH cuda runtime_error ai_generated true

RuntimeError: CUDA error: CUBLAS_STATUS_ARCH_MISMATCH when calling cublasSgemm

ID: cuda/cublas-api-not-found

Also available as: JSON · Markdown · 中文
82%Fix Rate
85%Confidence
1Evidence
2023-05-12First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
CUDA 11.8 active
cuBLAS 11.11 active
PyTorch 2.0.1 active
NVIDIA Driver 525.85.05 active

Root Cause

The GPU's compute capability is too low for the cuBLAS kernel being invoked, typically because the code was compiled for sm_80+ but the GPU only supports sm_70 or earlier.

generic

中文

GPU 的计算能力低于所调用 cuBLAS 内核的要求,通常是因为代码针对 sm_80+ 编译,但 GPU 仅支持 sm_70 或更早版本。

Official Documentation

https://docs.nvidia.com/cuda/cublas/index.html#cublas-status-t

Workarounds

  1. 70% success export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
    export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
  2. 85% success export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose
    export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose

中文步骤

  1. export CUBLAS_WORKSPACE_CONFIG=":4096:8" && python your_script.py
  2. export TORCH_CUDA_ARCH_LIST='7.0;7.5' && pip install --no-cache-dir torch --verbose

Dead Ends

Common approaches that don't work:

  1. 90% fail

    Reinstallation does not change the GPU hardware or the compiled architecture targets; the mismatch persists.

  2. 85% fail

    Driver updates do not alter cuBLAS library architecture requirements; the kernel still expects a higher compute capability.