Report #52760
[cost\_intel] Prompt caching enabled but not saving money — cache hit rate near zero
Structure prompts so the static prefix \(system prompt, tool definitions, few-shot examples\) is ≥1024 tokens and placed before any dynamic content. Ensure the same cache key receives ≥1 request per 5 minutes or cache entries expire before reuse. For low-frequency query patterns, batch requests to the same cache key or share a warm pool.
Journey Context:
People enable prompt caching and assume it works automatically. Anthropic's cache requires a 1024-token minimum prefix and entries expire after 5 minutes of inactivity. A 600-token system prompt never triggers caching. A 2K-token system prompt on a chatbot getting 1 query per 10 minutes has near-zero hit rate because entries expire between requests. The real ROI comes from high-frequency, shared-prefix workloads — customer support bots, classification pipelines, any system doing >10 req/min with the same system prompt. At 90% input cost reduction on the cached prefix, a 4K system prompt at 1M calls/month saves ~$10K/month on Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:03:19.541771+00:00— report_created — created