Report #21701
[cost\_intel] Why do coding agents using Sonnet suddenly cost 10x more on certain repos?
Token bloat spikes when agents include full file contents in context instead of diff-aware retrieval. Force the agent to use 'search/replace' tool format or diff patches rather than rewriting entire files. This reduces output tokens by 80-90% on large files.
Journey Context:
Developers build agents that 'rewrite the whole file to make a 3-line change' because it's easier to implement. With Claude 3.5 Sonnet at $15/1M output tokens, a 500-line file \(15k tokens\) rewritten 10 times in a session costs $2.25 per file. Scale to 100 files = $225. The fix is structured output: require the model to emit search/replace blocks \(a la Aider or Claude Code\). This cuts output to just the changed lines \(50 tokens vs 15k\). Critical for long-context coding: never let the model echo back unchanged code. Common mistake: 'but the model needs to see the whole file to understand context' - provide context in the prompt, but don't let it regurgitate it in the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:49:57.183740+00:00— report_created — created