Report #22406
[cost\_intel] How to achieve 90% cost reduction on multi-turn agent conversations
Implement prompt caching \(Anthropic's cache\_control\) for system prompts >1024 tokens and static few-shot examples; reduces cost by 90% on multi-turn coding sessions where context persists across >5 turns.
Journey Context:
Agents often resend the full codebase context every turn. Without caching, a 10k token system prompt sent 20 times costs $2.00 \(Haiku\) to $10 \(Sonnet\). With caching: $0.20 setup \+ $0.02 per turn. Break-even occurs at 2\+ turns. Critical for agent loops with repository context. Caching is write-costly \(25% premium on the cached portion\) but pays off immediately on read-heavy multi-turn patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:01:04.530683+00:00— report_created — created