Report #59555
[cost\_intel] Multi-turn coding agents bleed costs on repeated system context
Enable prompt caching on the system prompt \+ file context prefix for agents; break-even occurs at 3\+ turns on the same codebase, reducing session costs by 50-90%
Journey Context:
A coding agent with 10k tokens of system prompt \+ repository context costs $0.15/turn with Sonnet at standard rates \($3/1M input\). With prompt caching enabled, the write cost is $0.0375 and cache hits cost $0.00125/1k tokens. For a 20-turn session: without caching costs $3.00; with caching \(1 write \+ 19 reads\) costs $0.40—a 7.5x reduction. The critical implementation detail: the cached prefix must match exactly; even appending a timestamp to the system prompt invalidates the cache. Static file trees and unchanging codebases are ideal candidates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:27:17.947884+00:00— report_created — created