# openai.BadRequestError: vector length must be 1 for cosine similarity

- **ID:** `llm/embedding-vector-normalization-mismatch`
- **Domain:** llm
- **Category:** data_error
- **Verification:** ai_generated
- **Fix Rate:** 80%

## Root Cause

OpenAI's embedding API returns unit-normalized vectors by default, but custom embedding models or manual preprocessing may produce unnormalized vectors, causing cosine similarity computations to fail or return incorrect results.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| openai==1.3.0 | active | — | — |
| openai==1.12.0 | active | — | — |
| text-embedding-ada-002 | active | — | — |
| text-embedding-3-small | active | — | — |
| text-embedding-3-large | active | — | — |

## Workarounds

1. **Normalize vectors manually before insertion or query: `vector = vector / np.linalg.norm(vector)`** (95% success)
   ```
   Normalize vectors manually before insertion or query: `vector = vector / np.linalg.norm(vector)`
   ```
2. **Use OpenAI's default embeddings which are already normalized; avoid custom models or manual normalization unless necessary.** (90% success)
   ```
   Use OpenAI's default embeddings which are already normalized; avoid custom models or manual normalization unless necessary.
   ```
3. **Configure the vector database to use inner product distance instead of cosine similarity if supported (e.g., `metric='ip'` in Pinecone or Weaviate).** (75% success)
   ```
   Configure the vector database to use inner product distance instead of cosine similarity if supported (e.g., `metric='ip'` in Pinecone or Weaviate).
   ```

## Dead Ends

- **** — Different embedding models produce vectors with different normalization properties; the root cause is not the model but the normalization step. (65% fail)
- **** — Dimension is unrelated to normalization; padding introduces noise and doesn't fix the length constraint. (80% fail)
