CUDNN_STATUS_ARCH_MISMATCH cuda type_error ai_generated true

RuntimeError: Tensor Cores are not supported on the current device architecture (compute capability < 7.0)

ID: cuda/tensor-core-unsupported-arch

Also available as: JSON · Markdown · 中文

90%Fix Rate

86%Confidence

1Evidence

2024-01-20First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
CUDA 11.0	active	—	—	—
CUDA 12.1	active	—	—	—
CUDA 12.4	active	—	—	—

Root Cause

The GPU compute capability is below 7.0 (Volta), which is required for Tensor Core operations like mixed-precision training with float16 or bfloat16.

generic

中文

GPU 计算能力低于 7.0（Volta），这是张量核心操作（如使用 float16 或 bfloat16 的混合精度训练）所必需的。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetTensorNdDescriptor

Workarounds

90% success Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
```
Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
```
95% success If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.
```
If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.
```

中文步骤

Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).

If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.

Dead Ends

Common approaches that don't work:

90% fail
Upgrading the CUDA toolkit does not add Tensor Core support to older GPU architectures.
80% fail
Setting environment variable CUDA_LAUNCH_BLOCKING=1 does not enable Tensor Cores; it only serializes kernel launches.