CUDNN_STATUS_ARCH_MISMATCH cuda type_error ai_generated true

运行时错误:当前设备架构不支持张量核心(计算能力 < 7.0)

RuntimeError: Tensor Cores are not supported on the current device architecture (compute capability < 7.0)

ID: cuda/tensor-core-unsupported-arch

其他格式: JSON · Markdown 中文 · English
90%修复率
86%置信度
1证据数
2024-01-20首次发现

版本兼容性

版本状态引入弃用备注
CUDA 11.0 active
CUDA 12.1 active
CUDA 12.4 active

根因分析

GPU 计算能力低于 7.0(Volta),这是张量核心操作(如使用 float16 或 bfloat16 的混合精度训练)所必需的。

English

The GPU compute capability is below 7.0 (Volta), which is required for Tensor Core operations like mixed-precision training with float16 or bfloat16.

generic

官方文档

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetTensorNdDescriptor

解决方案

  1. Disable Tensor Core usage by setting torch.backends.cuda.matmul.allow_tf32 = False and torch.backends.cudnn.allow_tf32 = False, and use float32 precision instead of float16. For example: model.half() should be replaced with model.float(); and in training, use torch.amp.autocast(device_type='cuda', enabled=False).
  2. If Tensor Cores are essential, migrate to a GPU with compute capability >= 7.0 (e.g., Tesla V100, RTX 20 series, or newer). Check your GPU's compute capability at https://developer.nvidia.com/cuda-gpus.

无效尝试

常见但无效的做法:

  1. 90% 失败

    Upgrading the CUDA toolkit does not add Tensor Core support to older GPU architectures.

  2. 80% 失败

    Setting environment variable CUDA_LAUNCH_BLOCKING=1 does not enable Tensor Cores; it only serializes kernel launches.