NAN_LOSS tensorflow runtime_error ai_generated partial

tensorflow.python.framework.errors_impl.InvalidArgumentError: 损失为 inf 或 nan: 张量包含 NaN 值

tensorflow.python.framework.errors_impl.InvalidArgumentError: Loss is inf or nan : Tensor had NaN values

ID: tensorflow/optimizer-nan-loss

其他格式: JSON · Markdown 中文 · English

75%修复率

90%置信度

1证据数

2023-03-12首次发现

版本兼容性

版本	状态	引入	弃用	备注
tensorflow 2.8.0	active	—	—	—
tensorflow 2.9.0	active	—	—	—
tensorflow 2.10.0	active	—	—	—

根因分析

损失函数产生了 NaN 值，通常是由于梯度爆炸、除以零或损失计算中对零取对数。

English

The loss function produced NaN values, often due to exploding gradients, division by zero, or log of zero in the loss computation.

generic

官方文档

https://www.tensorflow.org/guide/keras/train_and_evaluate

解决方案

Add gradient clipping in the optimizer: `optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)` or use `tf.clip_by_global_norm`. Also check for log(0) by adding a small epsilon: `loss = -tf.reduce_sum(y_true * tf.math.log(y_pred + 1e-10))`.

无效尝试

常见但无效的做法:

90% 失败
This may delay NaN but does not fix the root cause; the loss can still explode later.
85% 失败
SGD is also susceptible to exploding gradients without clipping.