CUDA_ERROR_NOT_INITIALIZED pytorch config_error ai_generated true

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.

ID: pytorch/dataset-worker-fork-cuda

Also available as: JSON · Markdown · 中文

90%Fix Rate

90%Confidence

1Evidence

2023-01-10First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
PyTorch 1.13.0	active	—	—	—
PyTorch 2.0.0	active	—	—	—
CUDA 11.7	active	—	—	—
CUDA 12.1	active	—	—	—
Ubuntu 20.04	active	—	—	—
Ubuntu 22.04	active	—	—	—

Root Cause

The default 'fork' start method on Linux creates child processes that inherit the parent's CUDA context, but CUDA does not support re-initialization in forked processes, leading to errors when DataLoader workers try to use CUDA.

generic

中文

Linux 上默认的 'fork' 启动方法创建的子进程继承了父进程的 CUDA 上下文，但 CUDA 不支持在分叉进程中重新初始化，导致 DataLoader 工作进程尝试使用 CUDA 时出错。

Official Documentation

https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing

Workarounds

95% success Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.
```
Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.
```
85% success Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
```
Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
```
90% success Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
```
Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
```

中文步骤

Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.

Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.

Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()

Dead Ends

Common approaches that don't work:

Set num_workers=0 in DataLoader to disable multiprocessing 60% fail
This eliminates parallelism entirely, significantly slowing down data loading, especially for large datasets or heavy preprocessing.
Move CUDA operations to after DataLoader workers are created 95% fail
The error occurs because workers inherit the CUDA context from the parent; moving operations does not change the inheritance problem.
Use torch.cuda.set_device(0) inside worker_init_fn 90% fail
Setting the device after fork does not resolve the CUDA re-initialization issue; the context is already corrupted.