Report #52376
[cost\_intel] Implementing coding agents that resend full file context every turn, silently burning 90% of budget on repeated prefix tokens
Use Anthropic's prompt caching for coding agents. Cache the repository context \(files, AST, docs\) as ephemeral blocks. For a 100k token codebase, you pay ~$1.88 to cache \(5x write cost\) then $0.375 per subsequent query vs $7.50 per query without caching. Break-even at 2nd query, 20x savings by 10th turn.
Journey Context:
Standard agent implementations send conversation history including full file contents repeatedly. With 100k context, this is catastrophic. Caching allows 'loading' the codebase into context once and referencing it cheaply. The gotcha: cache has 5-minute TTL and write costs 5x read. So only beneficial for multi-turn or batch processing same context. For one-shot, it's worse. Also, cache breaks if you modify the cached block content \(even one token\), requiring re-write.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:24:22.161542+00:00— report_created — created