Report #52072
[cost\_intel] Prompt caching not working — still paying full input token costs despite enabling cache\_control
Place all static content \(system prompts, tool definitions, few-shot examples, retrieved documents\) at the START of the prompt with cache\_control markers. Variable content \(user query\) goes at the END. Cache hits require matching prefix — any change before the cached block invalidates it entirely.
Journey Context:
Developers enable prompt caching but insert variable content before static content, breaking the prefix match. Anthropic's cache works on exact prefix matching: the cache\_control marker tells the system 'cache everything up to here.' If you have a 3K token system prompt \+ 5K token RAG context repeated across requests, caching saves ~90% on those 8K tokens. At Sonnet pricing \($3/MTok input\), that's $24 vs $2.40 per 1K calls. But if you insert the user query before the RAG context, the prefix changes every request and you get zero cache hits. The 5-minute TTL means this is most valuable for bursty workloads or high-QPS pipelines. Also note the 25% write premium: writing to cache costs 1.25x base input price, so you need at least 2 cache hits within the TTL window to break even.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:54:01.408289+00:00— report_created — created