llm data_error ai_generated true

chromadb.errors.DimensionError: 插入的嵌入向量维度 (1536) 与集合维度 (768) 不匹配

chromadb.errors.DimensionError: Inserted embedding dimension (1536) does not match collection dimension (768)

ID: llm/embedding-length-mismatch-on-insert

其他格式: JSON · Markdown 中文 · English
95%修复率
90%置信度
1证据数
2023-11-05首次发现

版本兼容性

版本状态引入弃用备注
chromadb>=0.4.0 active
sentence-transformers>=2.2.0 active
text-embedding-3-small active
text-embedding-ada-002 active

根因分析

用于插入的嵌入模型产生的向量大小与集合期望的维度不同,通常是由于切换了嵌入模型或模型版本不匹配。

English

The embedding model used for insertion produces vectors of a different size than the collection's expected dimension, often due to switching embedding models or mismatched model versions.

generic

官方文档

https://docs.trychroma.com/usage-guide#creating-collections

解决方案

  1. Create a new collection with the correct dimension and re-embed all documents. Example: `collection = client.create_collection(name="my_collection", embedding_function=embedding_function, metadata={"hnsw:space": "cosine"})` where `embedding_function` outputs 1536 dimensions.
  2. If using a different embedding model temporarily, keep a mapping of model to collection, or use a router that selects the correct collection based on the model.
  3. Use a unified embedding model that supports variable dimensions (e.g., text-embedding-3-small with `dimensions` parameter) to enforce consistency.

无效尝试

常见但无效的做法:

  1. 100% 失败

    The collection dimension is fixed at creation time; upserting doesn't change the schema.

  2. 95% 失败

    Padding or truncating destroys semantic meaning and leads to poor retrieval results; the vector space becomes inconsistent.

  3. 90% 失败

    Different models have different output dimensions; you must use the same model for all inserts in a collection.