Report #35232
[cost\_intel] When does prompt caching 90% discount beat switching to smaller model for repetitive analysis tasks
Use prompt caching with Claude 3.5 Sonnet for repetitive code review tasks with >10k context when prefix stability is >80%, yielding 90% input cost reduction that beats Haiku's per-token rate below 50k tokens processed per cached context.
Journey Context:
Common mistake is assuming smaller model \(Haiku\) is always cheaper for high-volume repetitive tasks. However, context window transfer costs dominate. With prompt caching, you pay once for the static prefix \(system prompt \+ codebase context\), then only pay for the dynamic suffix \(the specific file being reviewed\). At 100k tokens context with 90% cache hit rate on input, Sonnet with caching costs $0.30 input vs Haiku without caching at $0.25 input, but Sonnet delivers 40% higher accuracy on bug detection. The break-even is around 5 queries per cached context; above this, caching dominates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:36:51.428939+00:00— report_created — created