RuntimeError: NVRTC 编译失败:错误:PTX 汇编需要 .target sm_52 或更高版本。当前目标:sm_50
RuntimeError: NVRTC compilation failed: error: Ptx assembly requires .target sm_52 or higher. Current target: sm_50
ID: cuda/nvrtc-ptx-arch-mismatch
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| CUDA 11.7 | active | — | — | — |
| CUDA 12.0 | active | — | — | — |
| NVRTC 11.7 | active | — | — | — |
| NVRTC 12.0 | active | — | — | — |
根因分析
PTX 汇编(内联或通过 nvrtc)需要最低计算能力 5.2(Maxwell)才能支持某些指令;针对 sm_50(Maxwell 5.0)缺少统一内存寻址和原生原子操作等特性,这些是 PTX 所需的。
English
PTX assembly (inline or via nvrtc) requires a minimum compute capability of 5.2 (Maxwell) to support certain instructions; targeting sm_50 (Maxwell 5.0) lacks features like unified memory addressing and native atomics needed for PTX.
官方文档
https://docs.nvidia.com/cuda/nvrtc/index.html#nvrtc-compilation解决方案
-
Update the target architecture to sm_52 or higher in the NVRTC compilation options. Example: pass '-arch=sm_52' or set the environment variable CUDAARCHS to include sm_52.
-
If the GPU actually supports sm_52 (e.g., Tesla M40 or newer), ensure the CUDA toolkit version is >= 8.0 which added sm_52 support. If the GPU is sm_50 only (e.g., Tesla K80), replace PTX assembly with equivalent CUDA C code that compiles without PTX.
无效尝试
常见但无效的做法:
-
60% 失败
Removing PTX instructions entirely may break the kernel functionality; the error only occurs if PTX is actually used.
-
95% 失败
Setting a higher architecture like sm_86 on an older GPU (e.g., sm_50) causes a different error: 'no kernel image is available for execution on the device'.