Report #65391
[cost\_intel] Autonomous coding agents burn 70% of budget re-processing the same file context across 10\+ turn conversations
Implement Anthropic's prompt caching with 5-minute TTL for the conversation history \+ file tree context in multi-turn coding agents. Cache the context window for 5-minute TTL, yielding 50-70% cost reduction on turns 3-15.
Journey Context:
In autonomous coding agents \(SWE-agent, Devin-style\), each turn sends the entire conversation history \+ file context to the API. By turn 10, 80% of the context is static \(files read in turn 1, system instructions\) but is re-billed at full price. Anthropic's prompt caching allows marking the initial system prompt \+ file context as 'ephemeral' with a TTL. The cache write costs 25% of base, but cache hits cost only 10%. For a 100k context agent loop: uncached turn = $0.80, cached hit = $0.08. Common error is implementing 'sliding window' truncation to save tokens, which destroys agent coherence by dropping early file reads. The degradation signature is 'amnesia'—the agent re-reading files it already saw because the context was truncated to save costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:14:18.504750+00:00— report_created — created