cudaErrorStreamOrderViolation
cuda
runtime_error
ai_generated
true
CUDA 错误:图启动期间流顺序违规 (cudaErrorStreamOrderViolation)
CUDA error: stream-order violation during graph launch (cudaErrorStreamOrderViolation)
ID: cuda/stream-order-violation-cuda-graph
85%修复率
88%置信度
1证据数
2024-01-10首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| CUDA 11.7 | active | — | — | — |
| CUDA 12.2 | active | — | — | — |
| PyTorch 2.1 | active | — | — | — |
| PyTorch 2.3 | active | — | — | — |
根因分析
CUDA 图在具有来自不同流或图的未决操作的流上启动,违反了使用 CUDA 图捕获时的隐式排序约束。
English
A CUDA graph is launched on a stream that has pending operations from a different stream or graph, violating the implicit ordering constraints when using CUDA graph capturing.
官方文档
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graph-stream-order解决方案
-
Ensure all operations on the target stream are synchronized before launching a graph. Use `torch.cuda.synchronize()` or stream synchronization primitives before `cudaGraphLaunch`.
-
Re-capture the graph on a dedicated stream that is not used for other operations, ensuring no cross-stream dependencies.
无效尝试
常见但无效的做法:
-
90% 失败
The error is about stream synchronization, not parallelism; adding workers can introduce more streams and worsen the violation.
-
60% 失败
This removes the performance benefit but does not fix the underlying stream management; the error may reappear if graphs are re-enabled.