{
  "id": "llm/context-window-exceeded-with-chunked-streaming",
  "signature": "Error: context length exceeded while processing streaming chunks — partial response returned",
  "signature_zh": "错误：处理流式数据块时超出上下文长度 — 返回部分响应",
  "regex": "context length exceeded|partial response returned|stream truncated",
  "domain": "llm",
  "category": "runtime_error",
  "subcategory": null,
  "root_cause": "During streaming, cumulative input and output tokens exceed the model's context window, causing the API to truncate the response mid-stream without a clear error.",
  "root_cause_type": "generic",
  "root_cause_zh": "在流式处理期间，累积的输入和输出令牌超过了模型的上下文窗口，导致API在流中间截断响应，而没有明确的错误提示。",
  "versions": [
    {
      "version": "openai==1.12.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "anthropic==0.25.0",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "langchain==0.1.12",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "gpt-4-turbo-2024-04-09",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    },
    {
      "version": "claude-3-opus-20240229",
      "introduced": null,
      "deprecated": null,
      "removed": null,
      "behavior_change": null,
      "status": "active"
    }
  ],
  "os_specific": {},
  "dead_ends": [
    {
      "action": "",
      "why_fails": "Increasing max_tokens in the request doesn't help because the total (input + output) exceeds the model's limit, and max_tokens only caps output.",
      "fail_rate": 0.85,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "Retrying the same request with no changes will reproduce the error since the context is still too large.",
      "fail_rate": 0.95,
      "condition": "",
      "sources": []
    },
    {
      "action": "",
      "why_fails": "Switching to a different streaming library (e.g., from openai to httpx) doesn't solve the underlying token limit issue.",
      "fail_rate": 0.9,
      "condition": "",
      "sources": []
    }
  ],
  "workarounds": [
    {
      "action": "Before streaming, calculate total tokens using tiktoken (e.g., `import tiktoken; enc = tiktoken.encoding_for_model('gpt-4'); tokens = enc.encode(prompt); if len(tokens) > 120000: truncate prompt`). Truncate the input to leave room for output.",
      "success_rate": 0.85,
      "how": "Before streaming, calculate total tokens using tiktoken (e.g., `import tiktoken; enc = tiktoken.encoding_for_model('gpt-4'); tokens = enc.encode(prompt); if len(tokens) > 120000: truncate prompt`). Truncate the input to leave room for output.",
      "condition": "",
      "sources": []
    },
    {
      "action": "Reduce the output length by lowering max_tokens, and implement a loop to resume generation from the last complete sentence if truncated.",
      "success_rate": 0.75,
      "how": "Reduce the output length by lowering max_tokens, and implement a loop to resume generation from the last complete sentence if truncated.",
      "condition": "",
      "sources": []
    }
  ],
  "workarounds_zh": [
    "在流式处理前，使用tiktoken计算总令牌数（例如：`import tiktoken; enc = tiktoken.encoding_for_model('gpt-4'); tokens = enc.encode(prompt); if len(tokens) > 120000: truncate prompt`）。截断输入以为输出留出空间。",
    "通过降低max_tokens来减少输出长度，并实现一个循环，在截断时从最后一个完整句子恢复生成。"
  ],
  "transition_graph": {
    "leads_to": [],
    "preceded_by": [],
    "frequently_confused_with": []
  },
  "official_doc_url": "https://platform.openai.com/docs/guides/rate-limits/error-mitigation",
  "official_doc_section": null,
  "error_code": null,
  "verification_tier": "ai_generated",
  "confidence": 0.85,
  "fix_success_rate": 0.8,
  "resolvable": "true",
  "first_seen": "2024-03-15",
  "last_confirmed": "2024-06-01",
  "last_updated": "2024-06-01",
  "evidence_count": 1,
  "tags": [],
  "locale": "en",
  "aliases": []
}