Report #44108
[cost\_intel] Failing to cache static system prompts and few-shot examples in RAG pipelines
Apply Anthropic prompt caching to the static RAG prefix \(system instructions \+ few-shot examples\); reduces per-query cost by 70% when retrieval context varies but instructions don't
Journey Context:
RAG pipelines send the same system instructions and few-shot examples every request, varying only the user query and retrieved chunks. Without caching, you pay for these static tokens \(often 2-5k tokens\) on every request. Caching writes them once \(at 1.25x cost\), then subsequent requests only pay for the varying query \+ chunks \+ completion. Essential for high-volume RAG \(e.g., customer support bots\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:30:22.225394+00:00— report_created — created