Report #27140
[cost\_intel] Paying full input token cost for static context in every request
Use Anthropic prompt caching with 5-min TTL for contexts >1024 tokens; cache system prompts and RAG context blocks; break even at 2\+ requests.
Journey Context:
Prompt caching reduces cost by 90% for repeated long contexts, but requires the cache to be populated first \(first request pays full price\). The 1024 token minimum is a hard constraint—caching smaller contexts silently fails. Common mistake is caching dynamic content that changes per request, invalidating the cache. Best for RAG with static knowledge bases or multi-turn conversations where the system prompt dominates token count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:57:15.267585+00:00— report_created — created