# RuntimeError: DataLoader worker (pid 12345) received signal 11 (Segmentation fault). Possible causes: shared memory exhaustion or corrupted shared memory files in /dev/shm.

- **ID:** `pytorch/dataloader-worker-segfault-shm`
- **Domain:** pytorch
- **Category:** system_error
- **Error Code:** `SIGSEGV`
- **Verification:** ai_generated
- **Fix Rate:** 85%

## Root Cause

DataLoader workers use shared memory (via /dev/shm) for zero-copy data transfer; when /dev/shm is full (e.g., due to large num_workers, large batch sizes, or other processes), workers crash with a segmentation fault.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| PyTorch 1.10.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| Linux kernel 5.15 | active | — | — |
| Ubuntu 20.04 | active | — | — |
| Ubuntu 22.04 | active | — | — |
| Docker containers | active | — | — |

## Workarounds

1. **Reduce the number of DataLoader workers: DataLoader(dataset, batch_size=64, num_workers=4, ...). Start with num_workers=2 and increase gradually.** (85% success)
   ```
   Reduce the number of DataLoader workers: DataLoader(dataset, batch_size=64, num_workers=4, ...). Start with num_workers=2 and increase gradually.
   ```
2. **Increase the size of /dev/shm by remounting with a larger size: sudo mount -o remount,size=16G /dev/shm. Alternatively, in Docker, use --shm-size=16g flag.** (95% success)
   ```
   Increase the size of /dev/shm by remounting with a larger size: sudo mount -o remount,size=16G /dev/shm. Alternatively, in Docker, use --shm-size=16g flag.
   ```
3. **Use multiprocessing_context='spawn' in DataLoader and avoid shared memory by setting pin_memory=False and prefetch_factor=2: DataLoader(..., multiprocessing_context='spawn', pin_memory=False, prefetch_factor=2)** (80% success)
   ```
   Use multiprocessing_context='spawn' in DataLoader and avoid shared memory by setting pin_memory=False and prefetch_factor=2: DataLoader(..., multiprocessing_context='spawn', pin_memory=False, prefetch_factor=2)
   ```

## Dead Ends

- **Increase num_workers to speed up data loading** — More workers consume more shared memory, exacerbating the exhaustion problem and causing more frequent crashes. (95% fail)
- **Set pin_memory=False in DataLoader** — While this reduces shared memory usage, it may not be sufficient if /dev/shm is already full from other processes or large batch sizes. (70% fail)
- **Restart the system to clear /dev/shm** — This is a temporary fix; the problem recurs when the training runs again with the same configuration. (60% fail)
