llm data_error ai_generated true

ValueError: Embedding dimension mismatch: index has dimension 1536 but new embeddings have dimension 768. Rebuild index or set allow_dangerous_deserialization=True.

ID: llm/llamaindex-embedding-dim-mismatch-update

Also available as: JSON · Markdown · 中文

90%Fix Rate

87%Confidence

1Evidence

2024-02-28First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
llama-index 0.10.0	active	—	—	—
text-embedding-ada-002	active	—	—	—
text-embedding-3-small	active	—	—	—
OpenAI API 2024-02-15	active	—	—	—

Root Cause

A LlamaIndex vector store index was built with one embedding model (e.g., text-embedding-ada-002, dim 1536) but is being updated with embeddings from a different model (e.g., text-embedding-3-small, dim 768).

generic

中文

LlamaIndex 向量存储索引是使用一个嵌入模型（例如 text-embedding-ada-002，维度 1536）构建的，但正在使用不同模型（例如 text-embedding-3-small，维度 768）的嵌入进行更新。

Official Documentation

https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index.html

Workarounds

95% success Rebuild the index from scratch with the new embedding model. In code: `from llama_index.core import VectorStoreIndex, SimpleDirectoryReader; documents = SimpleDirectoryReader('data').load_data(); index = VectorStoreIndex.from_documents(documents, embed_model='text-embedding-3-small'); index.storage_context.persist('new_index')`.
```
Rebuild the index from scratch with the new embedding model. In code: `from llama_index.core import VectorStoreIndex, SimpleDirectoryReader; documents = SimpleDirectoryReader('data').load_data(); index = VectorStoreIndex.from_documents(documents, embed_model='text-embedding-3-small'); index.storage_context.persist('new_index')`.
```
90% success Create a new collection in the vector database (e.g., Chroma) with the correct dimension and re-insert all documents.
```
Create a new collection in the vector database (e.g., Chroma) with the correct dimension and re-insert all documents.
```

中文步骤

使用新的嵌入模型从头重建索引。代码示例：`from llama_index.core import VectorStoreIndex, SimpleDirectoryReader; documents = SimpleDirectoryReader('data').load_data(); index = VectorStoreIndex.from_documents(documents, embed_model='text-embedding-3-small'); index.storage_context.persist('new_index')`。

在向量数据库（例如 Chroma）中创建具有正确维度的新集合，并重新插入所有文档。

Dead Ends

Common approaches that don't work:

100% fail
This flag allows loading a potentially malicious pickle file; it does not fix the dimension mismatch. The index will still reject new embeddings.
95% fail
Padding/truncating corrupts the embedding space, leading to meaningless similarity scores and broken retrieval.
60% fail
OpenAI deprecated text-embedding-ada-002; the old model may still be accessible but returns different embeddings over time due to model updates.