Report #51482
[cost\_intel] System prompt cache misses causing 10x token cost spikes in production
Explicitly verify cache\_hit metrics in response headers; implement cache warming for system prompts >1024 tokens; avoid dynamic timestamps or request\_ids in system prompts
Journey Context:
Anthropic's prompt caching discounts 90% for cache hits, but cache keys are exact string matches. Common failures: dynamic content \(timestamps, uuids\) in system prompts, temperature>0 causing non-deterministic prefix misses, or context window eviction. Many teams only discover this when bills spike after latency improvements \(cache misses are slower AND more expensive\). The fix requires treating system prompts as static templates with variable injection only in user messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:54:06.121549+00:00— report_created — created