{
  "id": "llm/embedding-vector-index-corruption-after-reindex",
  "signature": "chromadb.errors.InternalError: Index corruption detected. Rebuild required.",
  "signature_zh": "chromadb.errors.InternalError: 检测到索引损坏，需要重建。",
  "regex": "chromadb\\.errors\\.InternalError.*Index corruption detected",
  "domain": "llm",
  "category": "data_error",
  "subcategory": null,
  "root_cause": "ChromaDB index files become corrupted when a reindex operation is interrupted by a crash or network disconnect, leaving the HNSW graph in an inconsistent state.",
  "root_cause_type": "generic",
  "root_cause_zh": "当重建索引操作因崩溃或网络断开而中断时，ChromaDB 索引文件损坏，导致 HNSW 图处于不一致状态。",
  "versions": [
    {
      "version": "chromadb==0.4.22",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "chromadb==0.5.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "langchain-chroma==0.1.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    }
  ],
  "os_specific": {},
  "dead_ends": [
    {
      "action": "",
      "why_fails": "The corrupted HNSW graph persists on disk; restarting doesn't repair the structural damage, and the same corrupted files are loaded again.",
      "fail_rate": 0.95,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "reset() wipes all data, not just the corrupted index, causing data loss for unrelated collections. It's a nuclear option that destroys all embeddings.",
      "fail_rate": 0.98,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "If the original embedding source data is lost or not backed up, you cannot recreate the index. This only works if you have the raw documents and can re-embed them.",
      "fail_rate": 0.7,
      "condition": "",
      "sources": []
    }
  ],
  "workarounds": [
    {
      "action": "Identify the corrupted collection, delete it, and re-ingest the source documents: client.delete_collection('my_collection'); client.create_collection('my_collection'); then re-embed all documents. For production, maintain a backup of the source documents in a separate storage (e.g., S3) and a script to re-embed.",
      "success_rate": 0.95,
      "how": "Identify the corrupted collection, delete it, and re-ingest the source documents: client.delete_collection('my_collection'); client.create_collection('my_collection'); then re-embed all documents. For production, maintain a backup of the source documents in a separate storage (e.g., S3) and a script to re-embed.",
      "condition": "",
      "sources": []
    },
    {
      "action": "Use ChromaDB's built-in persistence check: run 'chroma run --path /path/to/persist --debug' and look for 'HNSW index integrity check failed'. Then use the Python client to repair: collection._client._admin_client.reset_collection('my_collection') (requires admin access).",
      "success_rate": 0.8,
      "how": "Use ChromaDB's built-in persistence check: run 'chroma run --path /path/to/persist --debug' and look for 'HNSW index integrity check failed'. Then use the Python client to repair: collection._client._admin_client.reset_collection('my_collection') (requires admin access).",
      "condition": "",
      "sources": []
    },
    {
      "action": "Set up a cron job to periodically validate index integrity using chromadb.api.types.validate_metadata and take a snapshot of the persistence directory before any reindex operation.",
      "success_rate": 0.75,
      "how": "Set up a cron job to periodically validate index integrity using chromadb.api.types.validate_metadata and take a snapshot of the persistence directory before any reindex operation.",
      "condition": "",
      "sources": []
    }
  ],
  "workarounds_zh": [
    "识别损坏的集合，删除它，然后重新摄取源文档：client.delete_collection('my_collection'); client.create_collection('my_collection'); 然后重新嵌入所有文档。对于生产环境，将源文档备份到独立存储（如 S3），并编写一个重新嵌入的脚本。",
    "使用 ChromaDB 的内置持久性检查：运行 'chroma run --path /path/to/persist --debug' 并查找 'HNSW index integrity check failed'。然后使用 Python 客户端修复：collection._client._admin_client.reset_collection('my_collection')（需要管理员权限）。",
    "设置一个 cron 任务，定期使用 chromadb.api.types.validate_metadata 验证索引完整性，并在任何重建索引操作之前对持久性目录进行快照。"
  ],
  "transition_graph": {
    "leads_to": [],
    "preceded_by": [],
    "frequently_confused_with": []
  },
  "official_doc_url": "https://docs.trychroma.com/troubleshooting#index-corruption",
  "official_doc_section": null,
  "error_code": "CHROMA-ERR-0042",
  "verification_tier": "ai_generated",
  "confidence": 0.85,
  "fix_success_rate": 0.82,
  "resolvable": "true",
  "first_seen": "2024-06-15",
  "last_confirmed": "2024-06-01",
  "last_updated": "2024-06-01",
  "evidence_count": 1,
  "tags": [],
  "locale": "en",
  "aliases": []
}