CUDNN_STATUS_ARCH_MISMATCH cuda type_error ai_generated true

运行时错误：当前设备架构不支持张量核心（计算能力 < 7.0）

RuntimeError: Tensor Cores are not supported on the current device architecture (compute capability < 7.0)

ID: cuda/tensor-core-unsupported-arch

其他格式: JSON · Markdown 中文 · English

90%修复率

86%置信度

1证据数

2024-01-20首次发现

版本兼容性

版本	状态	引入	弃用	备注
CUDA 11.0	active	—	—	—
CUDA 12.1	active	—	—	—
CUDA 12.4	active	—	—	—

根因分析

GPU 计算能力低于 7.0（Volta），这是张量核心操作（如使用 float16 或 bfloat16 的混合精度训练）所必需的。

English

The GPU compute capability is below 7.0 (Volta), which is required for Tensor Core operations like mixed-precision training with float16 or bfloat16.

generic

官方文档

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetTensorNdDescriptor

解决方案

Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).

If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.

无效尝试

常见但无效的做法:

90% 失败
Upgrading the CUDA toolkit does not add Tensor Core support to older GPU architectures.
80% 失败
Setting environment variable CUDA_LAUNCH_BLOCKING=1 does not enable Tensor Cores; it only serializes kernel launches.