# chromadb.errors.InternalError: Index corruption detected. Rebuild required.

- **ID:** `llm/embedding-vector-index-corruption-after-reindex`
- **Domain:** llm
- **Category:** data_error
- **Error Code:** `CHROMA-ERR-0042`
- **Verification:** ai_generated
- **Fix Rate:** 82%

## Root Cause

ChromaDB index files become corrupted when a reindex operation is interrupted by a crash or network disconnect, leaving the HNSW graph in an inconsistent state.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| chromadb==0.4.22 | active | — | — |
| chromadb==0.5.0 | active | — | — |
| langchain-chroma==0.1.0 | active | — | — |

## Workarounds

1. **Identify the corrupted collection, delete it, and re-ingest the source documents: client.delete_collection('my_collection'); client.create_collection('my_collection'); then re-embed all documents. For production, maintain a backup of the source documents in a separate storage (e.g., S3) and a script to re-embed.** (95% success)
   ```
   Identify the corrupted collection, delete it, and re-ingest the source documents: client.delete_collection('my_collection'); client.create_collection('my_collection'); then re-embed all documents. For production, maintain a backup of the source documents in a separate storage (e.g., S3) and a script to re-embed.
   ```
2. **Use ChromaDB's built-in persistence check: run 'chroma run --path /path/to/persist --debug' and look for 'HNSW index integrity check failed'. Then use the Python client to repair: collection._client._admin_client.reset_collection('my_collection') (requires admin access).** (80% success)
   ```
   Use ChromaDB's built-in persistence check: run 'chroma run --path /path/to/persist --debug' and look for 'HNSW index integrity check failed'. Then use the Python client to repair: collection._client._admin_client.reset_collection('my_collection') (requires admin access).
   ```
3. **Set up a cron job to periodically validate index integrity using chromadb.api.types.validate_metadata and take a snapshot of the persistence directory before any reindex operation.** (75% success)
   ```
   Set up a cron job to periodically validate index integrity using chromadb.api.types.validate_metadata and take a snapshot of the persistence directory before any reindex operation.
   ```

## Dead Ends

- **** — The corrupted HNSW graph persists on disk; restarting doesn't repair the structural damage, and the same corrupted files are loaded again. (95% fail)
- **** — reset() wipes all data, not just the corrupted index, causing data loss for unrelated collections. It's a nuclear option that destroys all embeddings. (98% fail)
- **** — If the original embedding source data is lost or not backed up, you cannot recreate the index. This only works if you have the raw documents and can re-embed them. (70% fail)
