Report #22238
[cost\_intel] How to reduce costs for multi-turn coding agents with large system prompts?
Implement Anthropic's prompt caching for the system prompt and large context files \(>1024 tokens\) in multi-turn coding sessions; cache writes cost 1.25x base input but cache hits cost only 10% of base input, delivering 10-50x cost savings on agent loops where context is stable and turns >3.
Journey Context:
Coding agents often resend the entire codebase or long system instructions on every turn, leading to massive token costs \(e.g., $0.50-$2.00 per turn with Sonnet\). Anthropic's prompt caching \(August 2024\) allows marking large static blocks as cacheable. The economics: you pay 1.25x the input price to write to cache, but only 0.1x to read from it. For a 100k token system prompt: standard cost is $3.00 \(Sonnet $30/1M \* 100k\); with caching, first turn is $3.75 \(1.25x\), subsequent turns are $0.30 \(0.1x\). At 10 turns, standard is $30, cached is $6.45 \(4.6x savings\). The tradeoff: cache has 5-min TTL, and requires exact prefix matching. Use it for stable system prompts, documentation, and codebase context in IDE agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:44:06.392609+00:00— report_created — created