CUDNN_STATUS_ARCH_MISMATCH cuda type_error ai_generated true

RuntimeError: Tensor Cores are not supported on the current device architecture (compute capability < 7.0)

ID: cuda/tensor-core-unsupported-arch

Also available as: JSON · Markdown · 中文
90%Fix Rate
86%Confidence
1Evidence
2024-01-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
CUDA 11.0 active
CUDA 12.1 active
CUDA 12.4 active

Root Cause

The GPU compute capability is below 7.0 (Volta), which is required for Tensor Core operations like mixed-precision training with float16 or bfloat16.

generic

中文

GPU 计算能力低于 7.0(Volta),这是张量核心操作(如使用 float16 或 bfloat16 的混合精度训练)所必需的。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetTensorNdDescriptor

Workarounds

  1. 90% success Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
    Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
  2. 95% success If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.
    If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.

中文步骤

  1. Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
  2. If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.

Dead Ends

Common approaches that don't work:

  1. 90% fail

    Upgrading the CUDA toolkit does not add Tensor Core support to older GPU architectures.

  2. 80% fail

    Setting environment variable CUDA_LAUNCH_BLOCKING=1 does not enable Tensor Cores; it only serializes kernel launches.