pytorch runtime_error ai_generated true

RuntimeError: DataLoader worker (pid 12345) pin_memory(): CUDA error: invalid device context

ID: pytorch/dataloader-pin-memory-cuda

Also available as: JSON · Markdown · 中文
88%Fix Rate
86%Confidence
1Evidence
2023-02-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
torch>=1.6.0 active
CUDA>=11.0 active

Root Cause

DataLoader with pin_memory=True spawns workers that attempt to use CUDA from forked processes, causing invalid device context due to CUDA not supporting fork after initialization.

generic

中文

使用pin_memory=True的DataLoader会生成工作进程,这些进程尝试从fork的子进程中使用CUDA,导致设备上下文无效,因为CUDA在初始化后不支持fork。

Official Documentation

https://pytorch.org/docs/stable/data.html#multi-process-data-loading

Workarounds

  1. 95% success Set multiprocessing start method to 'spawn': torch.multiprocessing.set_start_method('spawn', force=True) before creating DataLoader
    Set multiprocessing start method to 'spawn': torch.multiprocessing.set_start_method('spawn', force=True) before creating DataLoader
  2. 85% success Use pin_memory=False in DataLoader and manually move tensors to GPU after loading
    Use pin_memory=False in DataLoader and manually move tensors to GPU after loading
  3. 90% success Move CUDA initialization after DataLoader creation or use single-process loading with num_workers=0
    Move CUDA initialization after DataLoader creation or use single-process loading with num_workers=0

中文步骤

  1. Set multiprocessing start method to 'spawn': torch.multiprocessing.set_start_method('spawn', force=True) before creating DataLoader
  2. Use pin_memory=False in DataLoader and manually move tensors to GPU after loading
  3. Move CUDA initialization after DataLoader creation or use single-process loading with num_workers=0

Dead Ends

Common approaches that don't work:

  1. 80% fail

    Workers need CUDA context for pin_memory; hiding devices breaks the purpose.

  2. 90% fail

    More workers increase chance of CUDA context issues.