Report #49855
[cost\_intel] Prompt caching hit rate near 0% despite stable system prompts
Ensure the cached prefix is truly static: remove dynamic timestamps, request IDs, rotating few-shot examples, and user-specific context from the system/tool-definition prefix. Move any variable content after the cached prefix boundary. Monitor cache\_read\_input\_tokens vs cache\_creation\_input\_tokens in API responses to verify hits.
Journey Context:
Anthropic's prompt caching caches the longest static prefix of the prompt. Even a single changing character \(like a timestamp or request ID\) in the system prompt invalidates the entire cache. Teams commonly add 'Current time: \{now\}' or rotate few-shot examples in the system prompt, silently reducing cache hit rate to near zero. Cache writes \(cache\_creation\_input\_tokens\) are 25% MORE expensive than base input tokens, so failed caching actually increases costs. The fix is architectural: separate your prompt into a static cached prefix \(system instructions, tool definitions, fixed examples\) and a dynamic suffix \(user message, context\). For OpenAI's automatic caching, the same principle applies: the prefix must be identical across requests. A 10K-token system prompt cached at 90% discount saves ~$27 per million input tokens on Sonnet — but only if the prefix is truly immutable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:09:42.442375+00:00— report_created — created