# RuntimeError: 调用 cudnnBatchNormalizationForwardTraining 时出现 cuDNN 错误：CUDNN_STATUS_BAD_PARAM，epsilon < 0

- **ID:** `cuda/cudnn-bn-epsilon-negative`
- **领域:** cuda
- **类别:** runtime_error
- **错误码:** `CUDNN_STATUS_BAD_PARAM (4)`
- **验证级别:** ai_generated
- **修复率:** 93%

## 根因

cuDNN 批归一化例程要求 epsilon >= 0（通常为小的正值，如 1e-5）；负 epsilon 违反批归一化的数学定义，cuDNN 将其作为错误参数拒绝。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| cuDNN 8.9.0 | active | — | — |
| cuDNN 9.0.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## 解决方案

1. ```
   Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;
   ```
2. ```
   Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);
   ```

## 无效尝试

- **** — Setting epsilon to a very large value (e.g., 1.0) causes numerical instability (division by sqrt(var+1.0) ~ 1) and poor training accuracy, but cuDNN does not error out; this masks the real issue. (70% 失败率)
- **** — Disabling cuDNN batch normalization (torch.backends.cudnn.enabled=False) forces a fallback to PyTorch's own implementation, which may accept negative epsilon but produces incorrect gradients. (60% 失败率)
