# RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.

- **ID:** `pytorch/dataset-worker-fork-cuda`
- **Domain:** pytorch
- **Category:** config_error
- **Error Code:** `CUDA_ERROR_NOT_INITIALIZED`
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

The default 'fork' start method on Linux creates child processes that inherit the parent's CUDA context, but CUDA does not support re-initialization in forked processes, leading to errors when DataLoader workers try to use CUDA.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| PyTorch 1.13.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| CUDA 11.7 | active | — | — |
| CUDA 12.1 | active | — | — |
| Ubuntu 20.04 | active | — | — |
| Ubuntu 22.04 | active | — | — |

## Workarounds

1. **Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.** (95% success)
   ```
   Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.
   ```
2. **Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.** (85% success)
   ```
   Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
   ```
3. **Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()** (90% success)
   ```
   Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
   ```

## Dead Ends

- **Set num_workers=0 in DataLoader to disable multiprocessing** — This eliminates parallelism entirely, significantly slowing down data loading, especially for large datasets or heavy preprocessing. (60% fail)
- **Move CUDA operations to after DataLoader workers are created** — The error occurs because workers inherit the CUDA context from the parent; moving operations does not change the inheritance problem. (95% fail)
- **Use torch.cuda.set_device(0) inside worker_init_fn** — Setting the device after fork does not resolve the CUDA re-initialization issue; the context is already corrupted. (90% fail)