llm
runtime_error
ai_generated
true
警告:由于系统消息在请求之间发生变化,提示缓存已禁用
Warning: Prompt caching disabled because system message changed between requests
ID: llm/prompt-caching-ignored-with-system-message-change
90%修复率
86%置信度
1证据数
2024-06-01首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| anthropic-python>=0.25.0 | active | — | — | — |
| claude-3-opus-20240229 | active | — | — | — |
| claude-3-sonnet-20240229 | active | — | — | — |
根因分析
LLM API 提示缓存(如 Anthropic 的提示缓存)要求跨请求的系统消息相同才能重用缓存前缀;任何更改都会使缓存失效。
English
LLM API prompt caching (e.g., Anthropic's prompt caching) requires identical system messages across requests to reuse cached prefixes; any change invalidates the cache.
官方文档
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching解决方案
-
Ensure system messages are identical across requests that should benefit from caching. Use a template system: `system_message = {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant.", "cache_control": {"type": "ephemeral"}}]}` and reuse this object. -
If system message must change, structure the prompt so that the variable part is in the user message, keeping system message static. Example: system = "You are a math tutor.", user = "Solve: {problem}". -
Monitor cache metrics via API response headers (e.g., `x-should-cache`) to verify caching is working.
无效尝试
常见但无效的做法:
-
90% 失败
This defeats the purpose of caching and actually guarantees cache misses.
-
70% 失败
Caching is most effective on system messages; user messages vary too much to benefit from caching.
-
80% 失败
The warning indicates caching is disabled; ignoring it means paying for full compute on every request.