NAN_LOSS tensorflow runtime_error ai_generated partial

tensorflow.python.framework.errors_impl.InvalidArgumentError: 损失为 inf 或 nan: 张量包含 NaN 值

tensorflow.python.framework.errors_impl.InvalidArgumentError: Loss is inf or nan : Tensor had NaN values

ID: tensorflow/optimizer-nan-loss

其他格式: JSON · Markdown 中文 · English
75%修复率
90%置信度
1证据数
2023-03-12首次发现

版本兼容性

版本状态引入弃用备注
tensorflow 2.8.0 active
tensorflow 2.9.0 active
tensorflow 2.10.0 active

根因分析

损失函数产生了 NaN 值,通常是由于梯度爆炸、除以零或损失计算中对零取对数。

English

The loss function produced NaN values, often due to exploding gradients, division by zero, or log of zero in the loss computation.

generic

官方文档

https://www.tensorflow.org/guide/keras/train_and_evaluate

解决方案

  1. Add gradient clipping in the optimizer: `optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)` or use `tf.clip_by_global_norm`. Also check for log(0) by adding a small epsilon: `loss = -tf.reduce_sum(y_true * tf.math.log(y_pred + 1e-10))`.

无效尝试

常见但无效的做法:

  1. 90% 失败

    This may delay NaN but does not fix the root cause; the loss can still explode later.

  2. 85% 失败

    SGD is also susceptible to exploding gradients without clipping.