Report #40449
[cost\_intel] High token costs in multi-turn code review agents
Use Anthropic's prompt caching with 5m\+ TTL for system prompts and codebase context; reduces input costs by 90% for sessions >10 turns
Journey Context:
Without caching, each turn resends the full codebase context \(50k-200k tokens\). Caching stores context server-side with 90% discount on cache hits. Tradeoff: cache writes cost 25% more than base input, so break-even at 4\+ turns. Common error: setting TTL too short \(seconds\) causing cache misses on human-paced interactions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:21:58.648624+00:00— report_created — created