CUDNN_STATUS_BAD_PARAM
cuda
type_error
ai_generated
true
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when setting batch normalization epsilon to 0
ID: cuda/cudnn-bn-epsilon-nan
95%Fix Rate
90%Confidence
1Evidence
2023-06-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| cuDNN 8.6 | active | — | — | — |
| cuDNN 8.9 | active | — | — | — |
| PyTorch 2.0 | active | — | — | — |
| PyTorch 2.2 | active | — | — | — |
Root Cause
cuDNN batch normalization requires a positive epsilon value to avoid division by zero; setting epsilon to 0 triggers a parameter validation error.
generic中文
cuDNN 批量归一化需要正 epsilon 值以避免除以零;将 epsilon 设置为 0 会触发参数验证错误。
Official Documentation
https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnn-batch-normalizationWorkarounds
-
95% success Set epsilon to a positive value >= 1e-5 in the batch normalization layer definition. For PyTorch, use `nn.BatchNorm2d(num_features, eps=1e-5)`.
Set epsilon to a positive value >= 1e-5 in the batch normalization layer definition. For PyTorch, use `nn.BatchNorm2d(num_features, eps=1e-5)`.
-
90% success If epsilon is loaded from a config file, add validation to clamp it to a minimum of 1e-5 before passing to the layer.
If epsilon is loaded from a config file, add validation to clamp it to a minimum of 1e-5 before passing to the layer.
中文步骤
Set epsilon to a positive value >= 1e-5 in the batch normalization layer definition. For PyTorch, use `nn.BatchNorm2d(num_features, eps=1e-5)`.
If epsilon is loaded from a config file, add validation to clamp it to a minimum of 1e-5 before passing to the layer.
Dead Ends
Common approaches that don't work:
-
50% fail
While this avoids the error, it may cause numerical instability in batch normalization; recommended minimum is 1e-5.
-
80% fail
This changes the model architecture and may degrade accuracy; overkill for a parameter fix.