CUDNN_STATUS_BAD_PARAM (4) cuda runtime_error ai_generated true

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when calling cudnnBatchNormalizationForwardTraining with epsilon < 0

ID: cuda/cudnn-bn-epsilon-negative

Also available as: JSON · Markdown · 中文

93%Fix Rate

86%Confidence

1Evidence

2023-05-08First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
cuDNN 8.9.0	active	—	—	—
cuDNN 9.0.0	active	—	—	—
PyTorch 2.0.0	active	—	—	—
PyTorch 2.1.0	active	—	—	—

Root Cause

cuDNN batch normalization routines require epsilon >= 0 (typically a small positive value, e.g., 1e-5); a negative epsilon violates the mathematical definition of batch normalization and cuDNN rejects it as a bad parameter.

generic

中文

cuDNN 批归一化例程要求 epsilon >= 0（通常为小的正值，如 1e-5）；负 epsilon 违反批归一化的数学定义，cuDNN 将其作为错误参数拒绝。

Official Documentation

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnBatchNormalizationForwardTraining

Workarounds

95% success Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;
```
Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;
```
90% success Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);
```
Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);
```

中文步骤

Ensure epsilon is a small positive float, typically 1e-5. Example: if (epsilon < 0) epsilon = 1e-5;

Add a validation check before the cuDNN call to clamp epsilon to a minimum positive value. Example: epsilon = max(epsilon, 1e-7);

Dead Ends

Common approaches that don't work:

70% fail
Setting epsilon to a very large value (e.g., 1.0) causes numerical instability (division by sqrt(var+1.0) ~ 1) and poor training accuracy, but cuDNN does not error out; this masks the real issue.
60% fail
Disabling cuDNN batch normalization (torch.backends.cudnn.enabled=False) forces a fallback to PyTorch's own implementation, which may accept negative epsilon but produces incorrect gradients.