cudaErrorGraphUpdateViolation cuda runtime_error ai_generated true

运行时错误:CUDA 错误:未执行图更新,因为它包含违反实例化 CUDA 图特定约束的更改 (cudaErrorGraphUpdateViolation) - 内存池不匹配

RuntimeError: CUDA error: the graph update was not performed because it included changes which violated constraints specific to instantiated CUDA graphs (cudaErrorGraphUpdateViolation) - memory pool mismatch

ID: cuda/cudagraph-memory-pool-mismatch

其他格式: JSON · Markdown 中文 · English
80%修复率
86%置信度
1证据数
2024-03-10首次发现

版本兼容性

版本状态引入弃用备注
CUDA 12.0 active
CUDA 12.3 active
PyTorch 2.3.0 active
NVIDIA Driver 545.23 active

根因分析

当使用 `cudaGraphInstantiate` 和 `cudaGraphInstantiateFlagAutoFreeOnLaunch` 更新 CUDA 图时,新图节点引用了与原始实例化不同的内存池,这是不允许的,因为图的内存池在实例化后是固定的。

English

When updating a CUDA graph using `cudaGraphInstantiate` with `cudaGraphInstantiateFlagAutoFreeOnLaunch`, the new graph nodes reference a different memory pool than the original instantiation, which is not allowed because the graph's memory pool is fixed after instantiation.

generic

官方文档

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH_1g1a5b9a2b8c3f4e5d6a7b8c9d0e1f2a3b

解决方案

  1. 通过使用 `torch.cuda.cudart().cudaMemPoolSetAttribute` 创建自定义池并在图捕获前显式将其分配给所有张量,确保图中使用的所有张量来自同一内存池。或者,避免使用 `cudaGraphInstantiateFlagAutoFreeOnLaunch` 并手动管理内存。
  2. 不要更新图,而是在内存池更改时每次捕获一个新图。使用 `torch.cuda.CUDAGraph`,并且仅在输入形状和内存池未更改时调用 `graph.replay()`。如果它们更改,再次调用 `graph.capture_begin()` 重新捕获。
  3. 设置环境变量 `CUDA_GRAPH_DEBUG=1` 以启用 CUDA 图运行时的详细日志记录,这会打印内存池地址并帮助识别导致不匹配的节点。

无效尝试

常见但无效的做法:

  1. Calling `torch.cuda.empty_cache()` before graph capture to free memory 90% 失败

    Emptying the cache does not change the memory pool assignment; the graph will still capture from the default pool, and the update will still fail if the pool changes.

  2. Using `cudaGraphInstantiate` without the `AutoFreeOnLaunch` flag 70% 失败

    While this avoids the pool mismatch error, it disables automatic memory management and may lead to memory leaks or performance degradation.

  3. Rebuilding the entire graph from scratch instead of updating 60% 失败

    Rebuilding works but is inefficient; the error is about update constraints, not about graph creation itself.