# RuntimeError: CUDA error: the graph update was not performed because it included changes which violated constraints specific to instantiated CUDA graphs (cudaErrorGraphUpdateViolation) - memory pool mismatch

- **ID:** `cuda/cudagraph-memory-pool-mismatch`
- **Domain:** cuda
- **Category:** runtime_error
- **Error Code:** `cudaErrorGraphUpdateViolation`
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

When updating a CUDA graph using `cudaGraphInstantiate` with `cudaGraphInstantiateFlagAutoFreeOnLaunch`, the new graph nodes reference a different memory pool than the original instantiation, which is not allowed because the graph's memory pool is fixed after instantiation.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| CUDA 12.0 | active | — | — |
| CUDA 12.3 | active | — | — |
| PyTorch 2.3.0 | active | — | — |
| NVIDIA Driver 545.23 | active | — | — |

## Workarounds

1. **Ensure that all tensors used in the graph are allocated from the same memory pool by using `torch.cuda.cudart().cudaMemPoolSetAttribute` to create a custom pool and explicitly assigning it to all tensors before graph capture. Alternatively, avoid using `cudaGraphInstantiateFlagAutoFreeOnLaunch` and manage memory manually.** (85% success)
   ```
   Ensure that all tensors used in the graph are allocated from the same memory pool by using `torch.cuda.cudart().cudaMemPoolSetAttribute` to create a custom pool and explicitly assigning it to all tensors before graph capture. Alternatively, avoid using `cudaGraphInstantiateFlagAutoFreeOnLaunch` and manage memory manually.
   ```
2. **Instead of updating the graph, capture a new graph each time the memory pool changes. Use `torch.cuda.CUDAGraph` and call `graph.replay()` only if the input shapes and memory pools are unchanged. If they change, call `graph.capture_begin()` again to recapture.** (90% success)
   ```
   Instead of updating the graph, capture a new graph each time the memory pool changes. Use `torch.cuda.CUDAGraph` and call `graph.replay()` only if the input shapes and memory pools are unchanged. If they change, call `graph.capture_begin()` again to recapture.
   ```
3. **Set environment variable `CUDA_GRAPH_DEBUG=1` to enable verbose logging from the CUDA graph runtime, which prints the memory pool addresses and helps identify which node causes the mismatch.** (75% success)
   ```
   Set environment variable `CUDA_GRAPH_DEBUG=1` to enable verbose logging from the CUDA graph runtime, which prints the memory pool addresses and helps identify which node causes the mismatch.
   ```

## Dead Ends

- **Calling `torch.cuda.empty_cache()` before graph capture to free memory** — Emptying the cache does not change the memory pool assignment; the graph will still capture from the default pool, and the update will still fail if the pool changes. (90% fail)
- **Using `cudaGraphInstantiate` without the `AutoFreeOnLaunch` flag** — While this avoids the pool mismatch error, it disables automatic memory management and may lead to memory leaks or performance degradation. (70% fail)
- **Rebuilding the entire graph from scratch instead of updating** — Rebuilding works but is inefficient; the error is about update constraints, not about graph creation itself. (60% fail)
