RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
ID: pytorch/dataset-worker-fork-cuda
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| PyTorch 1.13.0 | active | — | — | — |
| PyTorch 2.0.0 | active | — | — | — |
| CUDA 11.7 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
| Ubuntu 20.04 | active | — | — | — |
| Ubuntu 22.04 | active | — | — | — |
Root Cause
The default 'fork' start method on Linux creates child processes that inherit the parent's CUDA context, but CUDA does not support re-initialization in forked processes, leading to errors when DataLoader workers try to use CUDA.
generic中文
Linux 上默认的 'fork' 启动方法创建的子进程继承了父进程的 CUDA 上下文,但 CUDA 不支持在分叉进程中重新初始化,导致 DataLoader 工作进程尝试使用 CUDA 时出错。
Official Documentation
https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessingWorkarounds
-
95% success Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.
Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context. -
85% success Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
-
90% success Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
中文步骤
Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
Dead Ends
Common approaches that don't work:
-
Set num_workers=0 in DataLoader to disable multiprocessing
60% fail
This eliminates parallelism entirely, significantly slowing down data loading, especially for large datasets or heavy preprocessing.
-
Move CUDA operations to after DataLoader workers are created
95% fail
The error occurs because workers inherit the CUDA context from the parent; moving operations does not change the inheritance problem.
-
Use torch.cuda.set_device(0) inside worker_init_fn
90% fail
Setting the device after fork does not resolve the CUDA re-initialization issue; the context is already corrupted.