Report #55530
[cost\_intel] Paying full input token cost on every turn of long-context code review or legal analysis
Enable prompt caching for sessions >10k tokens; cache the initial codebase snapshot or legal document corpus. On Anthropic API, this yields 90% input cost reduction after the first turn \(subsequent reads cost 0.1x base\). Break-even at turn 2; by turn 5, total cost is 40% of non-cached.
Journey Context:
Standard multi-turn code review sends full PR diff \(15k tokens\) every turn. Without caching, 5 turns = 75k input tokens at full price. With caching: turn 1 pays 1.25x to write cache, turns 2-5 pay 0.1x to read. Total: 15k\*1.25 \+ 60k\*0.1 = 18.75k \+ 6k = 24.75k effective cost vs 75k \(3x savings\). Critical constraint: caching requires exact prefix match; if you append new messages but modify the cached system prompt or document, you pay full price for the new version. Anti-pattern: alternating between two slightly different system prompts; this doubles cache storage cost without benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:42:12.566710+00:00— report_created — created