cudaErrorStreamOrderViolation
cuda
runtime_error
ai_generated
true
CUDA error: stream-order violation during graph launch (cudaErrorStreamOrderViolation)
ID: cuda/stream-order-violation-cuda-graph
85%Fix Rate
88%Confidence
1Evidence
2024-01-10First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 11.7 | active | — | — | — |
| CUDA 12.2 | active | — | — | — |
| PyTorch 2.1 | active | — | — | — |
| PyTorch 2.3 | active | — | — | — |
Root Cause
A CUDA graph is launched on a stream that has pending operations from a different stream or graph, violating the implicit ordering constraints when using CUDA graph capturing.
generic中文
CUDA 图在具有来自不同流或图的未决操作的流上启动,违反了使用 CUDA 图捕获时的隐式排序约束。
Official Documentation
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graph-stream-orderWorkarounds
-
85% success Ensure all operations on the target stream are synchronized before launching a graph. Use `torch.cuda.synchronize()` or stream synchronization primitives before `cudaGraphLaunch`.
Ensure all operations on the target stream are synchronized before launching a graph. Use `torch.cuda.synchronize()` or stream synchronization primitives before `cudaGraphLaunch`.
-
90% success Re-capture the graph on a dedicated stream that is not used for other operations, ensuring no cross-stream dependencies.
Re-capture the graph on a dedicated stream that is not used for other operations, ensuring no cross-stream dependencies.
中文步骤
Ensure all operations on the target stream are synchronized before launching a graph. Use `torch.cuda.synchronize()` or stream synchronization primitives before `cudaGraphLaunch`.
Re-capture the graph on a dedicated stream that is not used for other operations, ensuring no cross-stream dependencies.
Dead Ends
Common approaches that don't work:
-
90% fail
The error is about stream synchronization, not parallelism; adding workers can introduce more streams and worsen the violation.
-
60% fail
This removes the performance benefit but does not fix the underlying stream management; the error may reappear if graphs are re-enabled.