NVRTC_ERROR_COMPILATION (6) cuda build_error ai_generated true

RuntimeError: NVRTC compilation failed: error: Ptx assembly requires .target sm_52 or higher. Current target: sm_50

ID: cuda/nvrtc-ptx-arch-mismatch

Also available as: JSON · Markdown · 中文
92%Fix Rate
88%Confidence
1Evidence
2023-03-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
CUDA 11.7 active
CUDA 12.0 active
NVRTC 11.7 active
NVRTC 12.0 active

Root Cause

PTX assembly (inline or via nvrtc) requires a minimum compute capability of 5.2 (Maxwell) to support certain instructions; targeting sm_50 (Maxwell 5.0) lacks features like unified memory addressing and native atomics needed for PTX.

generic

中文

PTX 汇编(内联或通过 nvrtc)需要最低计算能力 5.2(Maxwell)才能支持某些指令;针对 sm_50(Maxwell 5.0)缺少统一内存寻址和原生原子操作等特性,这些是 PTX 所需的。

Official Documentation

https://docs.nvidia.com/cuda/nvrtc/index.html#nvrtc-compilation

Workarounds

  1. 95% success Update the target architecture to sm_52 or higher in the NVRTC compilation options. Example: pass '-arch=sm_52' or set the environment variable CUDAARCHS to include sm_52.
    Update the target architecture to sm_52 or higher in the NVRTC compilation options. Example: pass '-arch=sm_52' or set the environment variable CUDAARCHS to include sm_52.
  2. 85% success If the GPU actually supports sm_52 (e.g., Tesla M40 or newer), ensure the CUDA toolkit version is >= 8.0 which added sm_52 support. If the GPU is sm_50 only (e.g., Tesla K80), replace PTX assembly with equivalent CUDA C code that compiles without PTX.
    If the GPU actually supports sm_52 (e.g., Tesla M40 or newer), ensure the CUDA toolkit version is >= 8.0 which added sm_52 support. If the GPU is sm_50 only (e.g., Tesla K80), replace PTX assembly with equivalent CUDA C code that compiles without PTX.

中文步骤

  1. Update the target architecture to sm_52 or higher in the NVRTC compilation options. Example: pass '-arch=sm_52' or set the environment variable CUDAARCHS to include sm_52.
  2. If the GPU actually supports sm_52 (e.g., Tesla M40 or newer), ensure the CUDA toolkit version is >= 8.0 which added sm_52 support. If the GPU is sm_50 only (e.g., Tesla K80), replace PTX assembly with equivalent CUDA C code that compiles without PTX.

Dead Ends

Common approaches that don't work:

  1. 60% fail

    Removing PTX instructions entirely may break the kernel functionality; the error only occurs if PTX is actually used.

  2. 95% fail

    Setting a higher architecture like sm_86 on an older GPU (e.g., sm_50) causes a different error: 'no kernel image is available for execution on the device'.