llm runtime_error ai_generated true

错误：处理流式数据块时超出上下文长度 — 返回部分响应

Error: context length exceeded while processing streaming chunks — partial response returned

ID: llm/context-window-exceeded-with-chunked-streaming

其他格式: JSON · Markdown 中文 · English

80%修复率

85%置信度

1证据数

2024-03-15首次发现

版本兼容性

版本	状态	引入	弃用	备注
openai==1.12.0	active	—	—	—
anthropic==0.25.0	active	—	—	—
langchain==0.1.12	active	—	—	—
gpt-4-turbo-2024-04-09	active	—	—	—
claude-3-opus-20240229	active	—	—	—

根因分析

在流式处理期间，累积的输入和输出令牌超过了模型的上下文窗口，导致API在流中间截断响应，而没有明确的错误提示。

English

During streaming, cumulative input and output tokens exceed the model's context window, causing the API to truncate the response mid-stream without a clear error.

generic

官方文档

https://platform.openai.com/docs/guides/rate-limits/error-mitigation

解决方案

在流式处理前，使用tiktoken计算总令牌数（例如：`import tiktoken; enc = tiktoken.encoding_for_model('gpt-4'); tokens = enc.encode(prompt); if len(tokens) > 120000: truncate prompt`）。截断输入以为输出留出空间。

通过降低max_tokens来减少输出长度，并实现一个循环，在截断时从最后一个完整句子恢复生成。

无效尝试

常见但无效的做法:

85% 失败
Increasing max_tokens in the request doesn't help because the total (input + output) exceeds the model's limit, and max_tokens only caps output.
95% 失败
Retrying the same request with no changes will reproduce the error since the context is still too large.
90% 失败
Switching to a different streaming library (e.g., from openai to httpx) doesn't solve the underlying token limit issue.