# RuntimeError: step() called before loss.backward(). Ensure you call loss.backward() before optimizer.step().

- **ID:** `pytorch/optimizer-step-without-loss-backward`
- **Domain:** pytorch
- **Category:** runtime_error
- **Verification:** ai_generated
- **Fix Rate:** 95%

## Root Cause

The optimizer's step() method is invoked without a preceding backward() call, meaning gradients are not computed, and the optimizer attempts to update parameters with stale or zero gradients.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| PyTorch 1.12.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## Workarounds

1. **Ensure the training loop order is correct: for inputs, targets in dataloader: outputs = model(inputs); loss = criterion(outputs, targets); optimizer.zero_grad(); loss.backward(); optimizer.step()** (95% success)
   ```
   Ensure the training loop order is correct: for inputs, targets in dataloader: outputs = model(inputs); loss = criterion(outputs, targets); optimizer.zero_grad(); loss.backward(); optimizer.step()
   ```
2. **Add a conditional check before optimizer.step(): if loss.grad_fn is not None: optimizer.step() else: print('Skipping step: no gradient')** (85% success)
   ```
   Add a conditional check before optimizer.step(): if loss.grad_fn is not None: optimizer.step() else: print('Skipping step: no gradient')
   ```
3. **Use torch.no_grad() context manager only around inference, not around the backward pass. Example: with torch.no_grad(): outputs = model(inputs) for validation only.** (90% success)
   ```
   Use torch.no_grad() context manager only around inference, not around the backward pass. Example: with torch.no_grad(): outputs = model(inputs) for validation only.
   ```

## Dead Ends

- **Call optimizer.zero_grad() before loss.backward() to reset gradients** — zero_grad() only clears gradients, it does not compute them. The core issue is missing backward() call, not gradient accumulation. (80% fail)
- **Set requires_grad=False on all model parameters** — This disables gradient computation entirely, making the optimizer step meaningless and preventing learning. (95% fail)
- **Use a learning rate scheduler step before optimizer step** — Scheduler step does not trigger gradient computation; it only adjusts the learning rate. The error persists. (90% fail)