llm
data_error
ai_generated
partial
Warning: Input text truncated to 8192 tokens for embedding model 'text-embedding-3-small' — embedding quality may degrade
ID: llm/embedding-truncation-mismatch
80%Fix Rate
85%Confidence
1Evidence
2024-02-20First Seen
Version Compatibility
| Version | Status | Introduced | Deprecated | Notes |
|---|---|---|---|---|
| openai>=1.0.0 | active | — | — | — |
| text-embedding-3-small | active | — | — | — |
| text-embedding-3-large | active | — | — | — |
| text-embedding-ada-002 | active | — | — | — |
Root Cause
Embedding models have a maximum input token limit (e.g., 8192 for text-embedding-3-small); longer inputs are silently truncated, losing semantic information at the end of the text.
generic中文
嵌入模型有最大输入令牌限制(例如 text-embedding-3-small 为 8192);长输入会被静默截断,丢失文本末尾的语义信息。
Official Documentation
https://platform.openai.com/docs/guides/embeddings/embedding-modelsWorkarounds
-
90% success Pre-process input text by truncating to the model's token limit using the same tokenizer (e.g., tiktoken for OpenAI models) before sending to the API, and log the truncation explicitly.
Pre-process input text by truncating to the model's token limit using the same tokenizer (e.g., tiktoken for OpenAI models) before sending to the API, and log the truncation explicitly.
-
85% success Use a sliding window or chunking strategy: split long documents into overlapping chunks of max_tokens, embed each chunk separately, and store all embeddings with metadata for retrieval.
Use a sliding window or chunking strategy: split long documents into overlapping chunks of max_tokens, embed each chunk separately, and store all embeddings with metadata for retrieval.
-
75% success For RAG pipelines, prioritize embedding the most semantically important parts of the text (e.g., beginning and key sections) rather than relying on automatic truncation of the end.
For RAG pipelines, prioritize embedding the most semantically important parts of the text (e.g., beginning and key sections) rather than relying on automatic truncation of the end.
中文步骤
在发送到 API 之前,使用相同的分词器(例如 OpenAI 模型的 tiktoken)将输入文本预截断到模型的令牌限制,并显式记录截断。
使用滑动窗口或分块策略:将长文档分割成 max_tokens 的重叠块,分别嵌入每个块,并将所有嵌入连同元数据存储用于检索。
对于 RAG 管道,优先嵌入文本中语义最重要的部分(例如开头和关键部分),而不是依赖对末尾的自动截断。
Dead Ends
Common approaches that don't work:
-
95% fail
The embedding API does not accept a max_tokens parameter; truncation is automatic and controlled by model limits
-
70% fail
Averaging embeddings from different chunks loses positional and semantic relationships; not equivalent to a single embedding of the full text
-
85% fail
Truncated embeddings miss critical information from the end of the text, leading to poor retrieval quality in RAG systems