llm data_error ai_generated true

openai.BadRequestError: 向量长度必须为1以计算余弦相似度

openai.BadRequestError: vector length must be 1 for cosine similarity

ID: llm/embedding-vector-normalization-mismatch

其他格式: JSON · Markdown 中文 · English

80%修复率

85%置信度

1证据数

2023-11-15首次发现

版本兼容性

版本	状态	引入	弃用	备注
openai==1.3.0	active	—	—	—
openai==1.12.0	active	—	—	—
text-embedding-ada-002	active	—	—	—
text-embedding-3-small	active	—	—	—
text-embedding-3-large	active	—	—	—

根因分析

OpenAI 的嵌入 API 默认返回单位归一化向量，但自定义嵌入模型或手动预处理可能产生未归一化的向量，导致余弦相似度计算失败或返回错误结果。

English

OpenAI's embedding API returns unit-normalized vectors by default, but custom embedding models or manual preprocessing may produce unnormalized vectors, causing cosine similarity computations to fail or return incorrect results.

generic

官方文档

https://platform.openai.com/docs/guides/embeddings/embedding-models

解决方案

在插入或查询前手动归一化向量：`vector = vector / np.linalg.norm(vector)`

使用 OpenAI 默认的嵌入（已归一化），除非必要，否则避免自定义模型或手动归一化。

如果支持，将向量数据库配置为使用内积距离代替余弦相似度（例如，在 Pinecone 或 Weaviate 中设置 `metric='ip'`）。

无效尝试

常见但无效的做法:

65% 失败
Different embedding models produce vectors with different normalization properties; the root cause is not the model but the normalization step.
80% 失败
Dimension is unrelated to normalization; padding introduces noise and doesn't fix the length constraint.