# RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnBatchNormalizationForwardTraining with epsilon=1e-06

- **ID:** `cuda/cudnn-bn-epsilon-too-small`
- **Domain:** cuda
- **Category:** runtime_error
- **Error Code:** `CUDNN_STATUS_BAD_PARAM`
- **Verification:** ai_generated
- **Fix Rate:** 92%

## Root Cause

cuDNN batch normalization requires epsilon to be at least 1e-5 (or higher for certain data types like float16) to avoid numerical instability; a value of 1e-6 is too small and triggers a BAD_PARAM error.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| cuDNN 8.9.5 | active | — | — |
| cuDNN 9.0 | active | — | — |
| PyTorch 2.0 | active | — | — |
| PyTorch 2.1 | active | — | — |

## Workarounds

1. **Set epsilon to a value >= 1e-5. In PyTorch: nn.BatchNorm2d(num_features, eps=1e-5). For float16 models, use eps=1e-4 or higher. This is the recommended fix.** (95% success)
   ```
   Set epsilon to a value >= 1e-5. In PyTorch: nn.BatchNorm2d(num_features, eps=1e-5). For float16 models, use eps=1e-4 or higher. This is the recommended fix.
   ```
2. **If using a pre-trained model with a hardcoded epsilon, override it after loading: model.bn_layer.eps = 1e-5. Then reinitialize the batch norm statistics if needed.** (90% success)
   ```
   If using a pre-trained model with a hardcoded epsilon, override it after loading: model.bn_layer.eps = 1e-5. Then reinitialize the batch norm statistics if needed.
   ```
3. **Convert the model to use float32 for batch normalization layers only: model.bn_layer = model.bn_layer.float(). This allows smaller epsilon values but may increase memory usage.** (70% success)
   ```
   Convert the model to use float32 for batch normalization layers only: model.bn_layer = model.bn_layer.float(). This allows smaller epsilon values but may increase memory usage.
   ```

## Dead Ends

- **** — While it avoids the BAD_PARAM error, a large epsilon reduces the effectiveness of batch normalization, potentially degrading model accuracy. (30% fail)
- **** — This works but disables all cuDNN optimizations, significantly slowing down training. It's an overreaction if only the epsilon is wrong. (10% fail)
- **** — The error halts execution immediately; ignoring it is not possible without modifying the source code to catch the exception. (100% fail)
