Report #96932
[cost\_intel] Repeated long context prompts burn 90% of budget on static preamble without caching
Implement prompt caching \(Anthropic's cache control or OpenAI's prompt caching\) for any prompt with >4000 tokens of static context \(system prompts, document corpuses, conversation history\). Cache writes cost ~1.25x standard input, but cache hits cost ~0.1x \(90% discount\). Break-even at 2nd request; ROI increases linearly with request volume. Critical for RAG with large knowledge bases.
Journey Context:
Teams send the same 10k token knowledge base with every user query, paying full input rates repeatedly. Anthropic's prompt caching \(released Aug 2024\) and OpenAI's equivalent allow marking static sections as 'cacheable.' The economics: you pay 25% premium on the first write, then 90% discount on all subsequent reads. For a 10k token context at Claude 3.5 Sonnet rates: standard cost $0.30 per request; with caching after 10 requests: $0.0375 average. The failure mode is caching dynamic content—if the 'static' section changes frequently, you pay write penalties without read benefits. Best for: system prompts, legal documents, codebases, conversation history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:16:57.614452+00:00— report_created — created