llm data_error ai_generated true

ValueError:嵌入维度不匹配:索引维度为 1536,但新嵌入维度为 768。请重建索引或设置 allow_dangerous_deserialization=True。

ValueError: Embedding dimension mismatch: index has dimension 1536 but new embeddings have dimension 768. Rebuild index or set allow_dangerous_deserialization=True.

ID: llm/llamaindex-embedding-dim-mismatch-update

其他格式: JSON · Markdown 中文 · English
90%修复率
87%置信度
1证据数
2024-02-28首次发现

版本兼容性

版本状态引入弃用备注
llama-index 0.10.0 active
text-embedding-ada-002 active
text-embedding-3-small active
OpenAI API 2024-02-15 active

根因分析

LlamaIndex 向量存储索引是使用一个嵌入模型(例如 text-embedding-ada-002,维度 1536)构建的,但正在使用不同模型(例如 text-embedding-3-small,维度 768)的嵌入进行更新。

English

A LlamaIndex vector store index was built with one embedding model (e.g., text-embedding-ada-002, dim 1536) but is being updated with embeddings from a different model (e.g., text-embedding-3-small, dim 768).

generic

官方文档

https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index.html

解决方案

  1. 使用新的嵌入模型从头重建索引。代码示例:`from llama_index.core import VectorStoreIndex, SimpleDirectoryReader; documents = SimpleDirectoryReader('data').load_data(); index = VectorStoreIndex.from_documents(documents, embed_model='text-embedding-3-small'); index.storage_context.persist('new_index')`。
  2. 在向量数据库(例如 Chroma)中创建具有正确维度的新集合,并重新插入所有文档。

无效尝试

常见但无效的做法:

  1. 100% 失败

    This flag allows loading a potentially malicious pickle file; it does not fix the dimension mismatch. The index will still reject new embeddings.

  2. 95% 失败

    Padding/truncating corrupts the embedding space, leading to meaningless similarity scores and broken retrieval.

  3. 60% 失败

    OpenAI deprecated text-embedding-ada-002; the old model may still be accessible but returns different embeddings over time due to model updates.