NAN_LOSS
tensorflow
runtime_error
ai_generated
partial
tensorflow.python.framework.errors_impl.InvalidArgumentError: 损失为 inf 或 nan: 张量包含 NaN 值
tensorflow.python.framework.errors_impl.InvalidArgumentError: Loss is inf or nan : Tensor had NaN values
ID: tensorflow/optimizer-nan-loss
75%修复率
90%置信度
1证据数
2023-03-12首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| tensorflow 2.8.0 | active | — | — | — |
| tensorflow 2.9.0 | active | — | — | — |
| tensorflow 2.10.0 | active | — | — | — |
根因分析
损失函数产生了 NaN 值,通常是由于梯度爆炸、除以零或损失计算中对零取对数。
English
The loss function produced NaN values, often due to exploding gradients, division by zero, or log of zero in the loss computation.
官方文档
https://www.tensorflow.org/guide/keras/train_and_evaluate解决方案
-
Add gradient clipping in the optimizer: `optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)` or use `tf.clip_by_global_norm`. Also check for log(0) by adding a small epsilon: `loss = -tf.reduce_sum(y_true * tf.math.log(y_pred + 1e-10))`.
无效尝试
常见但无效的做法:
-
90% 失败
This may delay NaN but does not fix the root cause; the loss can still explode later.
-
85% 失败
SGD is also susceptible to exploding gradients without clipping.