{
  "id": "llm/llama-index-pipeline-cache-miss",
  "signature": "llama_index.core.ingestion.pipeline.IngestionCacheMiss: Cache miss for node 'node_abc123'. Re-processing.",
  "signature_zh": "llama_index.core.ingestion.pipeline.IngestionCacheMiss: 节点 'node_abc123' 缓存未命中，正在重新处理。",
  "regex": "llama_index\\.core\\.ingestion\\.pipeline\\.IngestionCacheMiss.*Cache miss for node",
  "domain": "llm",
  "category": "runtime_error",
  "subcategory": null,
  "root_cause": "LlamaIndex ingestion pipeline cache invalidation occurs when the document hash changes (e.g., due to metadata updates or text normalization), causing the cache to skip previously processed nodes and re-run expensive embedding and chunking steps.",
  "root_cause_type": "generic",
  "root_cause_zh": "当文档哈希值发生变化时（例如，由于元数据更新或文本规范化），LlamaIndex 摄取管道的缓存失效，导致缓存跳过已处理的节点，并重新运行昂贵的嵌入和分块步骤。",
  "versions": [
    {
      "version": "llama-index==0.10.43",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "llama-index-core==0.11.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    }
  ],
  "os_specific": {},
  "dead_ends": [
    {
      "action": "",
      "why_fails": "This eliminates all performance benefits of caching and causes the pipeline to re-process every document on every run, which is impractical for large datasets.",
      "fail_rate": 0.9,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "This is a temporary fix that doesn't address the root cause (hash changes). The cache will miss again on the next run if the document source is still being modified.",
      "fail_rate": 0.85,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "Custom hash functions are not supported in the current LlamaIndex cache implementation; attempting to override requires monkey-patching internal methods, which breaks on version updates.",
      "fail_rate": 0.95,
      "condition": "",
      "sources": []
    }
  ],
  "workarounds": [
    {
      "action": "Set a stable document hash by normalizing text before ingestion: use `pipeline.add_documents(documents, hash_ids=True)` and ensure document texts are normalized (e.g., lowercased, whitespace trimmed) before adding to the pipeline. Example: `from llama_index.core.node_parser import SimpleNodeParser; parser = SimpleNodeParser.from_defaults(); nodes = parser.get_nodes_from_documents(docs); pipeline.run(nodes=nodes, in_place=True)`.",
      "success_rate": 0.85,
      "how": "Set a stable document hash by normalizing text before ingestion: use `pipeline.add_documents(documents, hash_ids=True)` and ensure document texts are normalized (e.g., lowercased, whitespace trimmed) before adding to the pipeline. Example: `from llama_index.core.node_parser import SimpleNodeParser; parser = SimpleNodeParser.from_defaults(); nodes = parser.get_nodes_from_documents(docs); pipeline.run(nodes=nodes, in_place=True)`.",
      "condition": "",
      "sources": []
    },
    {
      "action": "Use a persistent cache directory outside the project folder: `pipeline = IngestionPipeline(cache=IngestionCache(persist_path='/data/cache/ingestion_cache'))` to avoid cache being wiped during deployments.",
      "success_rate": 0.8,
      "how": "Use a persistent cache directory outside the project folder: `pipeline = IngestionPipeline(cache=IngestionCache(persist_path='/data/cache/ingestion_cache'))` to avoid cache being wiped during deployments.",
      "condition": "",
      "sources": []
    },
    {
      "action": "Implement a custom cache key function by subclassing IngestionCache and overriding the `_get_cache_key` method to ignore metadata fields like 'last_modified' or 'version'.",
      "success_rate": 0.7,
      "how": "Implement a custom cache key function by subclassing IngestionCache and overriding the `_get_cache_key` method to ignore metadata fields like 'last_modified' or 'version'.",
      "condition": "",
      "sources": []
    }
  ],
  "workarounds_zh": [
    "通过在摄取前规范化文本来设置稳定的文档哈希：使用 `pipeline.add_documents(documents, hash_ids=True)` 并确保在添加到管道前将文档文本规范化（例如，小写化、去除空白）。示例：`from llama_index.core.node_parser import SimpleNodeParser; parser = SimpleNodeParser.from_defaults(); nodes = parser.get_nodes_from_documents(docs); pipeline.run(nodes=nodes, in_place=True)`。",
    "使用项目文件夹外部的持久缓存目录：`pipeline = IngestionPipeline(cache=IngestionCache(persist_path='/data/cache/ingestion_cache'))` 以避免在部署期间缓存被清除。",
    "通过继承 IngestionCache 并重写 `_get_cache_key` 方法来实现自定义缓存键函数，以忽略 'last_modified' 或 'version' 等元数据字段。"
  ],
  "transition_graph": {
    "leads_to": [],
    "preceded_by": [],
    "frequently_confused_with": []
  },
  "official_doc_url": "https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline.html#caching",
  "official_doc_section": null,
  "error_code": "LLAMA-ERR-0091",
  "verification_tier": "ai_generated",
  "confidence": 0.82,
  "fix_success_rate": 0.78,
  "resolvable": "partial",
  "first_seen": "2024-09-20",
  "last_confirmed": "2024-06-01",
  "last_updated": "2024-06-01",
  "evidence_count": 1,
  "tags": [],
  "locale": "en",
  "aliases": []
}