llm
data_error
ai_generated
true
chromadb.errors.DimensionError: 插入的嵌入向量维度 (1536) 与集合维度 (768) 不匹配
chromadb.errors.DimensionError: Inserted embedding dimension (1536) does not match collection dimension (768)
ID: llm/embedding-length-mismatch-on-insert
95%修复率
90%置信度
1证据数
2023-11-05首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| chromadb>=0.4.0 | active | — | — | — |
| sentence-transformers>=2.2.0 | active | — | — | — |
| text-embedding-3-small | active | — | — | — |
| text-embedding-ada-002 | active | — | — | — |
根因分析
用于插入的嵌入模型产生的向量大小与集合期望的维度不同,通常是由于切换了嵌入模型或模型版本不匹配。
English
The embedding model used for insertion produces vectors of a different size than the collection's expected dimension, often due to switching embedding models or mismatched model versions.
官方文档
https://docs.trychroma.com/usage-guide#creating-collections解决方案
-
Create a new collection with the correct dimension and re-embed all documents. Example: `collection = client.create_collection(name="my_collection", embedding_function=embedding_function, metadata={"hnsw:space": "cosine"})` where `embedding_function` outputs 1536 dimensions. -
If using a different embedding model temporarily, keep a mapping of model to collection, or use a router that selects the correct collection based on the model.
-
Use a unified embedding model that supports variable dimensions (e.g., text-embedding-3-small with `dimensions` parameter) to enforce consistency.
无效尝试
常见但无效的做法:
-
100% 失败
The collection dimension is fixed at creation time; upserting doesn't change the schema.
-
95% 失败
Padding or truncating destroys semantic meaning and leads to poor retrieval results; the vector space becomes inconsistent.
-
90% 失败
Different models have different output dimensions; you must use the same model for all inserts in a collection.