llm runtime_error ai_generated true

Warning: Prompt caching disabled because system message changed between requests

ID: llm/prompt-caching-ignored-with-system-message-change

Also available as: JSON · Markdown · 中文
90%Fix Rate
86%Confidence
1Evidence
2024-06-01First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
anthropic-python>=0.25.0 active
claude-3-opus-20240229 active
claude-3-sonnet-20240229 active

Root Cause

LLM API prompt caching (e.g., Anthropic's prompt caching) requires identical system messages across requests to reuse cached prefixes; any change invalidates the cache.

generic

中文

LLM API 提示缓存(如 Anthropic 的提示缓存)要求跨请求的系统消息相同才能重用缓存前缀;任何更改都会使缓存失效。

Official Documentation

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Workarounds

  1. 90% success Ensure system messages are identical across requests that should benefit from caching. Use a template system: `system_message = {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}]}` and reuse this object.
    Ensure system messages are identical across requests that should benefit from caching. Use a template system: `system_message = {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}]}` and reuse this object.
  2. 85% success If system message must change, structure the prompt so that the variable part is in the user message, keeping system message static. Example: system = "You are a math tutor.", user = "Solve: {problem}".
    If system message must change, structure the prompt so that the variable part is in the user message, keeping system message static. Example: system = "You are a math tutor.", user = "Solve: {problem}".
  3. 70% success Monitor cache metrics via API response headers (e.g., `x-should-cache`) to verify caching is working.
    Monitor cache metrics via API response headers (e.g., `x-should-cache`) to verify caching is working.

中文步骤

  1. Ensure system messages are identical across requests that should benefit from caching. Use a template system: `system_message = {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}]}` and reuse this object.
  2. If system message must change, structure the prompt so that the variable part is in the user message, keeping system message static. Example: system = "You are a math tutor.", user = "Solve: {problem}".
  3. Monitor cache metrics via API response headers (e.g., `x-should-cache`) to verify caching is working.

Dead Ends

Common approaches that don't work:

  1. 90% fail

    This defeats the purpose of caching and actually guarantees cache misses.

  2. 70% fail

    Caching is most effective on system messages; user messages vary too much to benefit from caching.

  3. 80% fail

    The warning indicates caching is disabled; ignoring it means paying for full compute on every request.