84 huggingface system_error ai_generated true

OSError: [Errno 84] Too many links while accessing dataset cache

ID: huggingface/dataset-cache-corruption

Also available as: JSON · Markdown · 中文
85%Fix Rate
88%Confidence
1Evidence
2023-02-20First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
datasets>=2.10.0 active
transformers>=4.25.0 active

Root Cause

The Hugging Face datasets library creates many symlinks in the cache directory, exceeding the filesystem's maximum link count (often 32000 on ext4).

generic

中文

Hugging Face数据集库在缓存目录中创建了大量符号链接,超过了文件系统的最大链接数(ext4上通常为32000)。

Official Documentation

https://huggingface.co/docs/datasets/cache

Workarounds

  1. 90% success Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
    Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
  2. 85% success Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
    Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
  3. 75% success Use streaming mode to avoid caching entirely.
    Use streaming mode to avoid caching entirely.

中文步骤

  1. Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
  2. Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
  3. Use streaming mode to avoid caching entirely.

Dead Ends

Common approaches that don't work:

  1. 80% fail

    Individual file deletion does not reduce the symlink count sufficiently; entire cache tree must be removed.

  2. 95% fail

    Reinstallation does not affect the existing cache directory structure.