Report #41601
[cost\_intel] High token costs in iterative code review loops with Sonnet
Enable prompt caching on the system prompt \+ code context; cache hits reduce input costs by 50-90% on turns 2\+
Journey Context:
Without caching, each turn re-bills the full context window \(e.g., 8k tokens of codebase\). With caching, the prefix is stored and subsequent calls only bill new tokens \+ a small cache read fee \(~10% of input cost\). Critical for 10\+ turn debugging sessions where uncached costs scale linearly but cached costs plateau after turn 2.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:18:06.329812+00:00— report_created — created