CUDNN_STATUS_BAD_PARAM cuda type_error ai_generated true

运行时错误:cuDNN 错误:将批量归一化 epsilon 设置为 0 时 CUDNN_STATUS_BAD_PARAM

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM when setting batch normalization epsilon to 0

ID: cuda/cudnn-bn-epsilon-nan

其他格式: JSON · Markdown 中文 · English
95%修复率
90%置信度
1证据数
2023-06-20首次发现

版本兼容性

版本状态引入弃用备注
cuDNN 8.6 active
cuDNN 8.9 active
PyTorch 2.0 active
PyTorch 2.2 active

根因分析

cuDNN 批量归一化需要正 epsilon 值以避免除以零;将 epsilon 设置为 0 会触发参数验证错误。

English

cuDNN batch normalization requires a positive epsilon value to avoid division by zero; setting epsilon to 0 triggers a parameter validation error.

generic

官方文档

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnn-batch-normalization

解决方案

  1. Set epsilon to a positive value >= 1e-5 in the batch normalization layer definition. For PyTorch, use `nn.BatchNorm2d(num_features, eps=1e-5)`.
  2. If epsilon is loaded from a config file, add validation to clamp it to a minimum of 1e-5 before passing to the layer.

无效尝试

常见但无效的做法:

  1. 50% 失败

    While this avoids the error, it may cause numerical instability in batch normalization; recommended minimum is 1e-5.

  2. 80% 失败

    This changes the model architecture and may degrade accuracy; overkill for a parameter fix.