# RuntimeError: 无法在分叉的子进程中重新初始化 CUDA。要在多进程中与 CUDA 一起使用，必须使用 'spawn' 启动方法。

- **ID:** `pytorch/dataset-worker-fork-cuda`
- **领域:** pytorch
- **类别:** config_error
- **错误码:** `CUDA_ERROR_NOT_INITIALIZED`
- **验证级别:** ai_generated
- **修复率:** 90%

## 根因

Linux 上默认的 'fork' 启动方法创建的子进程继承了父进程的 CUDA 上下文，但 CUDA 不支持在分叉进程中重新初始化，导致 DataLoader 工作进程尝试使用 CUDA 时出错。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| PyTorch 1.13.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| CUDA 11.7 | active | — | — |
| CUDA 12.1 | active | — | — |
| Ubuntu 20.04 | active | — | — |
| Ubuntu 22.04 | active | — | — |

## 解决方案

1. ```
   Set the start method to 'spawn' at the beginning of the script: import multiprocessing as mp; mp.set_start_method('spawn', force=True). This creates new processes that do not inherit the CUDA context.
   ```
2. ```
   Use the DataLoader with pin_memory=True and num_workers>0 only after moving the model to CPU temporarily, or use a custom collate_fn that moves data to GPU after loading.
   ```
3. ```
   Wrap the training code in a if __name__ == '__main__': block and call mp.set_start_method('spawn') before any CUDA calls. Example: if __name__ == '__main__': mp.set_start_method('spawn'); train()
   ```

## 无效尝试

- **Set num_workers=0 in DataLoader to disable multiprocessing** — This eliminates parallelism entirely, significantly slowing down data loading, especially for large datasets or heavy preprocessing. (60% 失败率)
- **Move CUDA operations to after DataLoader workers are created** — The error occurs because workers inherit the CUDA context from the parent; moving operations does not change the inheritance problem. (95% 失败率)
- **Use torch.cuda.set_device(0) inside worker_init_fn** — Setting the device after fork does not resolve the CUDA re-initialization issue; the context is already corrupted. (90% 失败率)