cuda build_error ai_generated partial

运行时错误:Triton编译失败:不支持的指令 'mma.sync.aligned.m16n8k16.row.col.f16.f16.f16'

RuntimeError: Triton compilation failed: unsupported instruction 'mma.sync.aligned.m16n8k16.row.col.f16.f16.f16'

ID: cuda/triton-asm-unsupported

其他格式: JSON · Markdown 中文 · English
80%修复率
86%置信度
1证据数
2024-06-12首次发现

版本兼容性

版本状态引入弃用备注
Triton 2.1.0 active
PyTorch 2.2.0 active
NVIDIA T4 (sm_75) active

根因分析

Triton内核使用了目标GPU架构不支持的PTX指令(例如mma.sync),通常是由于GPU较旧或计算能力不正确。

English

A Triton kernel uses a PTX instruction (e.g., mma.sync) that is not supported by the target GPU architecture, often due to an older GPU or incorrect compute capability.

generic

官方文档

https://triton-lang.org/main/index.html

解决方案

  1. Run the kernel on a GPU with compute capability >= 8.0 (Ampere or newer). Alternatively, disable Triton by setting environment variable `TORCHDYNAMO_USE_TRITON=0` to fall back to CUDA kernels.

无效尝试

常见但无效的做法:

  1. 70% 失败

    The error is hardware-limited, not software.

  2. 90% 失败

    Triton uses its own JIT compiler, independent of TensorExpr.