# RuntimeError: 在 loss.backward() 之前调用了 step()。请确保在 optimizer.step() 之前调用 loss.backward()。

- **ID:** `pytorch/optimizer-step-without-loss-backward`
- **领域:** pytorch
- **类别:** runtime_error
- **验证级别:** ai_generated
- **修复率:** 95%

## 根因

优化器的 step() 方法在没有先调用 backward() 的情况下被调用，意味着梯度未计算，优化器尝试使用过时或零梯度更新参数。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| PyTorch 1.12.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## 解决方案

1. ```
   Ensure the training loop order is correct: for inputs, targets in dataloader: outputs = model(inputs); loss = criterion(outputs, targets); optimizer.zero_grad(); loss.backward(); optimizer.step()
   ```
2. ```
   Add a conditional check before optimizer.step(): if loss.grad_fn is not None: optimizer.step() else: print('Skipping step: no gradient')
   ```
3. ```
   Use torch.no_grad() context manager only around inference, not around the backward pass. Example: with torch.no_grad(): outputs = model(inputs) for validation only.
   ```

## 无效尝试

- **Call optimizer.zero_grad() before loss.backward() to reset gradients** — zero_grad() only clears gradients, it does not compute them. The core issue is missing backward() call, not gradient accumulation. (80% 失败率)
- **Set requires_grad=False on all model parameters** — This disables gradient computation entirely, making the optimizer step meaningless and preventing learning. (95% 失败率)
- **Use a learning rate scheduler step before optimizer step** — Scheduler step does not trigger gradient computation; it only adjusts the learning rate. The error persists. (90% 失败率)