# RuntimeError: Dataset shuffling requires a deterministic seed for iterable datasets, but seed is None

- **ID:** `huggingface/dataset-shuffling-iterator-break`
- **Domain:** huggingface
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 88%

## Root Cause

IterableDataset does not support random shuffling without a fixed seed; the dataset iterator cannot be deterministically replayed for shuffling.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| datasets>=2.10.0 | active | — | — |
| torch>=1.13.0 | active | — | — |

## Workarounds

1. **Specify a seed when shuffling: dataset = dataset.shuffle(seed=42, buffer_size=1000). This ensures deterministic shuffle order for the streaming dataset.** (95% success)
   ```
   Specify a seed when shuffling: dataset = dataset.shuffle(seed=42, buffer_size=1000). This ensures deterministic shuffle order for the streaming dataset.
   ```
2. **Disable shuffling for IterableDataset and shuffle externally: train_loader = DataLoader(dataset, shuffle=False); then manually shuffle indices before each epoch if using MapDataset.** (80% success)
   ```
   Disable shuffling for IterableDataset and shuffle externally: train_loader = DataLoader(dataset, shuffle=False); then manually shuffle indices before each epoch if using MapDataset.
   ```

## Dead Ends

- **Set `shuffle=True` on the DataLoader without fixing the seed** — The DataLoader's shuffle is incompatible with IterableDataset; it raises an error or silently fails to shuffle. (80% fail)
- **Convert the IterableDataset to a MapDataset by calling `.to_iterable_dataset()`** — This method does not exist; conversion requires loading the entire dataset into memory, which defeats the purpose of streaming. (90% fail)
- **Use `dataset.shuffle(buffer_size=1000)` without a seed** — The shuffle method on IterableDataset requires a seed parameter; omitting it raises the same error. (100% fail)
