RuntimeError: 无法在分叉的子进程中重新初始化 CUDA。要在多进程中与 CUDA 一起使用,必须使用 'spawn' 启动方法。
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
ID: pytorch/dataset-worker-fork-cuda
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| PyTorch 1.13.0 | active | — | — | — |
| PyTorch 2.0.0 | active | — | — | — |
| CUDA 11.7 | active | — | — | — |
| CUDA 12.1 | active | — | — | — |
| Ubuntu 20.04 | active | — | — | — |
| Ubuntu 22.04 | active | — | — | — |
根因分析
Linux 上默认的 'fork' 启动方法创建的子进程继承了父进程的 CUDA 上下文,但 CUDA 不支持在分叉进程中重新初始化,导致 DataLoader 工作进程尝试使用 CUDA 时出错。
English
The default 'fork' start method on Linux creates child processes that inherit the parent's CUDA context, but CUDA does not support re-initialization in forked processes, leading to errors when DataLoader workers try to use CUDA.
官方文档
https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing解决方案
-
Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context. -
Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
-
Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
无效尝试
常见但无效的做法:
-
Set num_workers=0 in DataLoader to disable multiprocessing
60% 失败
This eliminates parallelism entirely, significantly slowing down data loading, especially for large datasets or heavy preprocessing.
-
Move CUDA operations to after DataLoader workers are created
95% 失败
The error occurs because workers inherit the CUDA context from the parent; moving operations does not change the inheritance problem.
-
Use torch.cuda.set_device(0) inside worker_init_fn
90% 失败
Setting the device after fork does not resolve the CUDA re-initialization issue; the context is already corrupted.