# RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnBatchNormalizationForwardTraining with epsilon < 0

- **ID:** `cuda/cudnn-bn-epsilon-negative`
- **Domain:** cuda
- **Category:** runtime_error
- **Error Code:** `CUDNN_STATUS_BAD_PARAM (4)`
- **Verification:** ai_generated
- **Fix Rate:** 93%

## Root Cause

cuDNN batch normalization routines require epsilon >= 0 (typically a small positive value, e.g., 1e-5); a negative epsilon violates the mathematical definition of batch normalization and cuDNN rejects it as a bad parameter.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| cuDNN 8.9.0 | active | — | — |
| cuDNN 9.0.0 | active | — | — |
| PyTorch 2.0.0 | active | — | — |
| PyTorch 2.1.0 | active | — | — |

## Workarounds

1. **Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;** (95% success)
   ```
   Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;
   ```
2. **Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);** (90% success)
   ```
   Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);
   ```

## Dead Ends

- **** — Setting epsilon to a very large value (e.g., 1.0) causes numerical instability (division by sqrt(var+1.0) ~ 1) and poor training accuracy, but cuDNN does not error out; this masks the real issue. (70% fail)
- **** — Disabling cuDNN batch normalization (torch.backends.cudnn.enabled=False) forces a fallback to PyTorch's own implementation, which may accept negative epsilon but produces incorrect gradients. (60% fail)
