Report #58801
[cost\_intel] Enabling Anthropic prompt caching for all system prompts indiscriminately
Only enable caching for static prefixes >4000 tokens queried >10 times/hour; disable for dynamic RAG contexts or spiky traffic \(<5 queries/hour\)
Journey Context:
Cache writes cost 1.25x base input \($3.75 vs $3/1M tok\). Break-even requires 5 cache hits to amortize write cost. At 50% hit rate with 4k context, you lose money. High-frequency assistants with large system prompts \(10k\+ tokens\) see 60% cost reduction; sporadic RAG pays 25% premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:11:08.966816+00:00— report_created — created