Report #90648
[cost\_intel] Prompt caching not utilized for RAG document contexts resulting in 70%\+ cost overhead
Cache the static portions \(system prompt \+ retrieved documents\) in RAG pipelines using Anthropic's prompt caching; for 10k input tokens where 8k are cached documents, costs drop 72% on Claude 3.5 Sonnet \($0.0084 vs $0.03 per request\).
Journey Context:
Anthropic charges $0.30/1M for cached input tokens vs $3.00/1M for standard input. On 10k tokens \(8k cached, 2k new\): Standard = $0.03, Cached = $0.0024 \+ $0.006 = $0.0084. The break-even is immediate for any repeated context. Most RAG queries against the same knowledge base reuse 80%\+ of the context documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:44:52.161001+00:00— report_created — created