pytorch
runtime_error
ai_generated
true
RuntimeError:DataLoader工作进程(pid 12345)pin_memory():CUDA错误:无效的设备上下文
RuntimeError: DataLoader worker (pid 12345) pin_memory(): CUDA error: invalid device context
ID: pytorch/dataloader-pin-memory-cuda
88%修复率
86%置信度
1证据数
2023-02-10首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| torch>=1.6.0 | active | — | — | — |
| CUDA>=11.0 | active | — | — | — |
根因分析
使用pin_memory=True的DataLoader会生成工作进程,这些进程尝试从fork的子进程中使用CUDA,导致设备上下文无效,因为CUDA在初始化后不支持fork。
English
DataLoader with pin_memory=True spawns workers that attempt to use CUDA from forked processes, causing invalid device context due to CUDA not supporting fork after initialization.
官方文档
https://pytorch.org/docs/stable/data.html#multi-process-data-loading解决方案
-
Set multiprocessing start method to 'spawn': torch.multiprocessing.set_start_method('spawn', force=True) before creating DataLoader -
Use pin_memory=False in DataLoader and manually move tensors to GPU after loading
-
Move CUDA initialization after DataLoader creation or use single-process loading with num_workers=0
无效尝试
常见但无效的做法:
-
80% 失败
Workers need CUDA context for pin_memory; hiding devices breaks the purpose.
-
90% 失败
More workers increase chance of CUDA context issues.