# tensorflow.python.framework.errors_impl.UnknownError: 无法保存检查点到 /tmp/model.ckpt: IO错误：设备空间不足 [Op:SaveV2]

- **ID:** `tensorflow/checkpoint-save-failed-io-error`
- **领域:** tensorflow
- **类别:** resource_error
- **错误码:** `ESAV`
- **验证级别:** ai_generated
- **修复率:** 85%

## 根因

检查点目录所在的磁盘分区 inode 或块耗尽，导致 SaveV2 操作失败。

## 版本兼容性

| 版本 | 状态 | 引入 | 弃用 |
|------|------|------|------|
| 2.12 | active | — | — |
| 2.13 | active | — | — |
| 2.14 | active | — | — |

## 解决方案

1. ```
   Check disk usage with 'df -h' and 'df -i', then delete unnecessary files or expand the partition. Alternatively, change checkpoint path to a partition with more space using tf.train.CheckpointManager with a different directory.
   ```
2. ```
   Enable checkpoint compression by setting options.experimental_io_device='/job:localhost' and using tf.train.CheckpointOptions(experimental_io_device='/job:localhost', experimental_enable_async_checkpoint=True) to reduce immediate disk usage.
   ```

## 无效尝试

- **Delete random files in /tmp to free space** — The checkpoint path may not be in /tmp; also deleting unrelated files can cause other failures. (60% 失败率)
- **Set TF_CPP_MIN_LOG_LEVEL=2 to suppress the error** — Suppressing logs does not resolve the underlying disk space issue; the checkpoint will still not be saved. (90% 失败率)
- **Reduce batch size to reduce checkpoint size** — Checkpoint size is determined by model parameters, not batch size; reducing batch size does not free disk space. (80% 失败率)
