cuda build_error ai_generated partial

RuntimeError: Triton compilation failed: unsupported instruction 'mma.sync.aligned.m16n8k16.row.col.f16.f16.f16'

ID: cuda/triton-asm-unsupported

Also available as: JSON · Markdown · 中文

80%Fix Rate

86%Confidence

1Evidence

2024-06-12First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
Triton 2.1.0	active	—	—	—
PyTorch 2.2.0	active	—	—	—
NVIDIA T4 (sm_75)	active	—	—	—

Root Cause

A Triton kernel uses a PTX instruction (e.g., mma.sync) that is not supported by the target GPU architecture, often due to an older GPU or incorrect compute capability.

generic

中文

Triton内核使用了目标GPU架构不支持的PTX指令（例如mma.sync），通常是由于GPU较旧或计算能力不正确。

Official Documentation

https://triton-lang.org/main/index.html

Workarounds

80% success Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.
```
Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.
```

中文步骤

Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.

Dead Ends

Common approaches that don't work:

70% fail
The error is hardware-limited, not software.
90% fail
Triton uses its own JIT compiler, independent of TensorExpr.