84 huggingface system_error ai_generated true

OS错误:[Errno 84] 访问数据集缓存时链接过多

OSError: [Errno 84] Too many links while accessing dataset cache

ID: huggingface/dataset-cache-corruption

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2023-02-20首次发现

版本兼容性

版本状态引入弃用备注
datasets>=2.10.0 active
transformers>=4.25.0 active

根因分析

Hugging Face数据集库在缓存目录中创建了大量符号链接,超过了文件系统的最大链接数(ext4上通常为32000)。

English

The Hugging Face datasets library creates many symlinks in the cache directory, exceeding the filesystem's maximum link count (often 32000 on ext4).

generic

官方文档

https://huggingface.co/docs/datasets/cache

解决方案

  1. Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
  2. Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
  3. Use streaming mode to avoid caching entirely.

无效尝试

常见但无效的做法:

  1. 80% 失败

    Individual file deletion does not reduce the symlink count sufficiently; entire cache tree must be removed.

  2. 95% 失败

    Reinstallation does not affect the existing cache directory structure.