cudaErrorStreamOrderViolation cuda runtime_error ai_generated true

CUDA 错误:图启动期间流顺序违规 (cudaErrorStreamOrderViolation)

CUDA error: stream-order violation during graph launch (cudaErrorStreamOrderViolation)

ID: cuda/stream-order-violation-cuda-graph

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2024-01-10首次发现

版本兼容性

版本状态引入弃用备注
CUDA 11.7 active
CUDA 12.2 active
PyTorch 2.1 active
PyTorch 2.3 active

根因分析

CUDA 图在具有来自不同流或图的未决操作的流上启动,违反了使用 CUDA 图捕获时的隐式排序约束。

English

A CUDA graph is launched on a stream that has pending operations from a different stream or graph, violating the implicit ordering constraints when using CUDA graph capturing.

generic

官方文档

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graph-stream-order

解决方案

  1. Ensure all operations on the target stream are synchronized before launching a graph. Use `torch.cuda.synchronize()` or stream synchronization primitives before `cudaGraphLaunch`.
  2. Re-capture the graph on a dedicated stream that is not used for other operations, ensuring no cross-stream dependencies.

无效尝试

常见但无效的做法:

  1. 90% 失败

    The error is about stream synchronization, not parallelism; adding workers can introduce more streams and worsen the violation.

  2. 60% 失败

    This removes the performance benefit but does not fix the underlying stream management; the error may reappear if graphs are re-enabled.