cuda
build_error
ai_generated
partial
RuntimeError: Triton compilation failed: unsupported instruction 'mma.sync.aligned.m16n8k16.row.col.f16.f16.f16'
ID: cuda/triton-asm-unsupported
80%Fix Rate
86%Confidence
1Evidence
2024-06-12First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| Triton 2.1.0 | active | — | — | — |
| PyTorch 2.2.0 | active | — | — | — |
| NVIDIA T4 (sm_75) | active | — | — | — |
Root Cause
A Triton kernel uses a PTX instruction (e.g., mma.sync) that is not supported by the target GPU architecture, often due to an older GPU or incorrect compute capability.
generic中文
Triton内核使用了目标GPU架构不支持的PTX指令(例如mma.sync),通常是由于GPU较旧或计算能力不正确。
Official Documentation
https://triton-lang.org/main/index.htmlWorkarounds
-
80% success Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.
Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.
中文步骤
Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.
Dead Ends
Common approaches that don't work:
-
70% fail
The error is hardware-limited, not software.
-
90% fail
Triton uses its own JIT compiler, independent of TensorExpr.