Report #75159
[cost\_intel] Re-sending full document context in every turn of a conversation
Implement Anthropic's prompt caching or equivalent; cache system prompts and document prefixes, reducing costs by 50-90% for 100k\+ token contexts after the first turn
Journey Context:
Without caching, a 10-turn conversation on a 50k token document costs 10×50k = 500k input tokens. With prompt caching \(writing once, reading 9 times at 10% cost\), effective cost drops to 50k \+ 9×5k = 95k tokens. The break-even is 2\+ turns. Critical for RAG pipelines where retrieved chunks are reused across multiple queries. Common pitfall: caching only works if the prefix is byte-identical; changing a single character invalidates the cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:45:18.235350+00:00— report_created — created