# 运行时错误：Triton编译失败：编译具有大共享内存的内核时LLVM错误：内存不足

- **ID:** `cuda/triton-compilation-llvm-crash`
- **领域:** cuda
- **类别:** build_error
- **验证级别:** ai_generated
- **修复率:** 72%

## 根因

Triton JIT编译器调用LLVM优化内核代码，但内核使用了过多的共享内存（大多数GPU上超过48KB每块），导致LLVM为寄存器溢出或优化分配的内存超过可用主机内存。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| Triton 2.3.0 | active | — | — |
| CUDA 12.5 | active | — | — |
| LLVM 18.1.0 | active | — | — |
| PyTorch 2.5.0 | active | — | — |

## 解决方案

1. ```
   Reduce shared memory usage in the Triton kernel: decrease block size or use fewer shared memory allocations. Example: change tl.constexpr BLOCK_SIZE from 128 to 64, and ensure shared memory is not allocated per-thread but per-block.
   ```
2. ```
   Set environment variable TRITON_MAX_SHARED_MEMORY to a lower value (e.g., 32768 bytes) to force Triton to generate kernels within limits. Command: export TRITON_MAX_SHARED_MEMORY=32768 before running the script.
   ```

## 无效尝试

- **** — The error is not about total system memory but about LLVM's internal allocation limits during compilation; more RAM does not help if the kernel design is flawed. (90% 失败率)
- **** — Caching is unrelated to compilation memory; it only affects reuse of compiled kernels. (95% 失败率)