Report #68464
[cost\_intel] Prompt caching ROI for RAG with long system contexts
Implement Anthropic's prompt caching for RAG workflows where system prompts exceed 10k tokens; caching reduces input costs by ~90% on subsequent calls with identical prefix, breaking even at the 2nd call.
Journey Context:
RAG systems often resend the same retrieved chunks repeatedly. Without caching, you pay full price for context windows on every single turn. Anthropic's caching charges 25% of base rate for cache writes and 10% for cache hits. For a 20k context, that's ~$0.30 write cost vs $1.20 per uncached call. After 2 calls, you're saving ~$0.90 per subsequent interaction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:24:07.919156+00:00— report_created — created