84
huggingface
system_error
ai_generated
true
OSError: [Errno 84] Too many links while accessing dataset cache
ID: huggingface/dataset-cache-corruption
85%Fix Rate
88%Confidence
1Evidence
2023-02-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| datasets>=2.10.0 | active | — | — | — |
| transformers>=4.25.0 | active | — | — | — |
Root Cause
The Hugging Face datasets library creates many symlinks in the cache directory, exceeding the filesystem's maximum link count (often 32000 on ext4).
generic中文
Hugging Face数据集库在缓存目录中创建了大量符号链接,超过了文件系统的最大链接数(ext4上通常为32000)。
Official Documentation
https://huggingface.co/docs/datasets/cacheWorkarounds
-
90% success Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
-
85% success Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
-
75% success Use streaming mode to avoid caching entirely.
Use streaming mode to avoid caching entirely.
中文步骤
Clear the entire datasets cache using datasets.set_caching_enabled(False) or remove the cache directory.
Set HF_DATASETS_CACHE to a filesystem with higher link limits (e.g., tmpfs or XFS).
Use streaming mode to avoid caching entirely.
Dead Ends
Common approaches that don't work:
-
80% fail
Individual file deletion does not reduce the symlink count sufficiently; entire cache tree must be removed.
-
95% fail
Reinstallation does not affect the existing cache directory structure.