# llama_index.core.ingestion.pipeline.IngestionCacheMiss: Cache miss for node 'node_abc123'. Re-processing.

- **ID:** `llm/llama-index-pipeline-cache-miss`
- **Domain:** llm
- **Category:** runtime_error
- **Error Code:** `LLAMA-ERR-0091`
- **Verification:** ai_generated
- **Fix Rate:** 78%

## Root Cause

LlamaIndex ingestion pipeline cache invalidation occurs when the document hash changes (e.g., due to metadata updates or text normalization), causing the cache to skip previously processed nodes and re-run expensive embedding and chunking steps.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| llama-index==0.10.43 | active | — | — |
| llama-index-core==0.11.0 | active | — | — |

## Workarounds

1. **Set a stable document hash by normalizing text before ingestion: use `pipeline.add_documents(documents, hash_ids=True)` and ensure document texts are normalized (e.g., lowercased, whitespace trimmed) before adding to the pipeline. Example: `from llama_index.core.node_parser import SimpleNodeParser; parser = SimpleNodeParser.from_defaults(); nodes = parser.get_nodes_from_documents(docs); pipeline.run(nodes=nodes, in_place=True)`.** (85% success)
   ```
   Set a stable document hash by normalizing text before ingestion: use `pipeline.add_documents(documents, hash_ids=True)` and ensure document texts are normalized (e.g., lowercased, whitespace trimmed) before adding to the pipeline. Example: `from llama_index.core.node_parser import SimpleNodeParser; parser = SimpleNodeParser.from_defaults(); nodes = parser.get_nodes_from_documents(docs); pipeline.run(nodes=nodes, in_place=True)`.
   ```
2. **Use a persistent cache directory outside the project folder: `pipeline = IngestionPipeline(cache=IngestionCache(persist_path='/data/cache/ingestion_cache'))` to avoid cache being wiped during deployments.** (80% success)
   ```
   Use a persistent cache directory outside the project folder: `pipeline = IngestionPipeline(cache=IngestionCache(persist_path='/data/cache/ingestion_cache'))` to avoid cache being wiped during deployments.
   ```
3. **Implement a custom cache key function by subclassing IngestionCache and overriding the `_get_cache_key` method to ignore metadata fields like 'last_modified' or 'version'.** (70% success)
   ```
   Implement a custom cache key function by subclassing IngestionCache and overriding the `_get_cache_key` method to ignore metadata fields like 'last_modified' or 'version'.
   ```

## Dead Ends

- **** — This eliminates all performance benefits of caching and causes the pipeline to re-process every document on every run, which is impractical for large datasets. (90% fail)
- **** — This is a temporary fix that doesn't address the root cause (hash changes). The cache will miss again on the next run if the document source is still being modified. (85% fail)
- **** — Custom hash functions are not supported in the current LlamaIndex cache implementation; attempting to override requires monkey-patching internal methods, which breaks on version updates. (95% fail)
