Report #30874
[cost\_intel] How to reduce costs for conversational agents with long context?
Implement prompt caching \(Anthropic\) or context caching \(Gemini\) for system prompts and RAG context; reduces costs by 90% for sessions over 10 turns where context >10k tokens.
Journey Context:
Without caching, every turn resends the full context. Caching hits on the prefix allow billing at ~10% of input rates. The trap is thinking caching helps for single-turn tasks—it only amortizes over multi-turn. Break-even is typically turn 3-4 with 8k\+ context. Many developers miss that tool definitions and few-shot examples are perfect cache candidates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:12:19.185156+00:00— report_created — created