llm data_error ai_generated true

chromadb.errors.DimensionError: Inserted embedding dimension (1536) does not match collection dimension (768)

ID: llm/embedding-length-mismatch-on-insert

Also available as: JSON · Markdown · 中文
95%Fix Rate
90%Confidence
1Evidence
2023-11-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
chromadb>=0.4.0 active
sentence-transformers>=2.2.0 active
text-embedding-3-small active
text-embedding-ada-002 active

Root Cause

The embedding model used for insertion produces vectors of a different size than the collection's expected dimension, often due to switching embedding models or mismatched model versions.

generic

中文

用于插入的嵌入模型产生的向量大小与集合期望的维度不同,通常是由于切换了嵌入模型或模型版本不匹配。

Official Documentation

https://docs.trychroma.com/usage-guide#creating-collections

Workarounds

  1. 95% success Create a new collection with the correct dimension and re-embed all documents. Example: `collection = client.create_collection(name="my_collection", embedding_function=embedding_function, metadata={"hnsw:space": "cosine"})` where `embedding_function` outputs 1536 dimensions.
    Create a new collection with the correct dimension and re-embed all documents. Example: `collection = client.create_collection(name="my_collection", embedding_function=embedding_function, metadata={"hnsw:space": "cosine"})` where `embedding_function` outputs 1536 dimensions.
  2. 80% success If using a different embedding model temporarily, keep a mapping of model to collection, or use a router that selects the correct collection based on the model.
    If using a different embedding model temporarily, keep a mapping of model to collection, or use a router that selects the correct collection based on the model.
  3. 70% success Use a unified embedding model that supports variable dimensions (e.g., text-embedding-3-small with `dimensions` parameter) to enforce consistency.
    Use a unified embedding model that supports variable dimensions (e.g., text-embedding-3-small with `dimensions` parameter) to enforce consistency.

中文步骤

  1. Create a new collection with the correct dimension and re-embed all documents. Example: `collection = client.create_collection(name="my_collection", embedding_function=embedding_function, metadata={"hnsw:space": "cosine"})` where `embedding_function` outputs 1536 dimensions.
  2. If using a different embedding model temporarily, keep a mapping of model to collection, or use a router that selects the correct collection based on the model.
  3. Use a unified embedding model that supports variable dimensions (e.g., text-embedding-3-small with `dimensions` parameter) to enforce consistency.

Dead Ends

Common approaches that don't work:

  1. 100% fail

    The collection dimension is fixed at creation time; upserting doesn't change the schema.

  2. 95% fail

    Padding or truncating destroys semantic meaning and leads to poor retrieval results; the vector space becomes inconsistent.

  3. 90% fail

    Different models have different output dimensions; you must use the same model for all inserts in a collection.