cudaErrorStreamCaptureInvalidated
cuda
runtime_error
ai_generated
true
RuntimeError: CUDA error: operation not permitted when stream is capturing (streamCaptureInvalidated)
ID: cuda/stream-capture-invalid-scope
81%Fix Rate
87%Confidence
1Evidence
2024-09-05First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| CUDA 12.0 | active | — | — | — |
| PyTorch 2.1.0 | active | — | — | — |
| NVIDIA Driver 535.129.03 | active | — | — | — |
Root Cause
A CUDA graph capture is in progress on a stream, but an operation (e.g., memory allocation, host-side sync) that is invalid during capture was attempted, invalidating the capture.
generic中文
流上正在进行CUDA图捕获,但尝试了捕获期间无效的操作(例如内存分配、主机端同步),导致捕获失效。
Official Documentation
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.htmlWorkarounds
-
88% success Move all memory allocations and host-device synchronization outside the capture scope. Example: pre-allocate tensors before calling torch.cuda.CUDAGraph.begin_capture(), and use torch.cuda.synchronize() only after capture ends.
Move all memory allocations and host-device synchronization outside the capture scope. Example: pre-allocate tensors before calling torch.cuda.CUDAGraph.begin_capture(), and use torch.cuda.synchronize() only after capture ends.
-
80% success Use cudaStreamBeginCapture with cudaStreamCaptureModeGlobal to allow more operations, but ensure no host-side blocking calls occur during capture. In PyTorch, wrap the capture in a context manager that defers any print or sleep calls.
Use cudaStreamBeginCapture with cudaStreamCaptureModeGlobal to allow more operations, but ensure no host-side blocking calls occur during capture. In PyTorch, wrap the capture in a context manager that defers any print or sleep calls.
中文步骤
Move all memory allocations and host-device synchronization outside the capture scope. Example: pre-allocate tensors before calling torch.cuda.CUDAGraph.begin_capture(), and use torch.cuda.synchronize() only after capture ends.
Use cudaStreamBeginCapture with cudaStreamCaptureModeGlobal to allow more operations, but ensure no host-side blocking calls occur during capture. In PyTorch, wrap the capture in a context manager that defers any print or sleep calls.
Dead Ends
Common approaches that don't work:
-
92% fail
This disables cuDNN heuristics but does not fix the capture violation; the error will reoccur if capture is attempted again.
-
98% fail
Thread configuration is unrelated to capture validity; the error is about operations allowed during capture.