Report #53474
[cost\_intel] Paying full token cost for every turn in iterative code review when 90% of context is static
Enable Anthropic prompt caching \(beta\) or OpenAI prompt caching for code review loops; cache the system prompt \+ file tree \+ existing code \(80%\+ of context\), paying only for new diff chunks per turn
Journey Context:
Standard multi-turn code editing re-sends entire codebase each iteration. With caching, you pay ~$0.05 per turn instead of $0.50. Critical for >10 turn sessions. Quality signature: ensure cache breaks don't truncate mid-function. Implementation: use anthropic-beta: prompt-caching-2024-07-31 header. Cache has 5-minute TTL but is extendable. Common pitfall: caching the entire conversation history instead of just the static codebase, which hits cache write costs unnecessarily.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:15:02.407357+00:00— report_created — created