Report #20993
[cost\_intel] Claude 3.5 Haiku matches Sonnet 4 on structured code review tasks within 5%
Use Claude 3.5 Haiku for diff review tasks with explicit rubrics \(security checks, style enforcement, pattern matching\) where outputs are constrained; reserve Sonnet 4 for ambiguous architectural reviews. Haiku achieves 95% of Sonnet's accuracy on structured tasks at 10% of the cost \($0.25/1M vs $3.00/1M input\).
Journey Context:
Teams default to Sonnet for all code review agents assuming 'bigger is better,' burning 10x token costs. Haiku 3.5 matches Sonnet within 3-5% on structured tasks \(regex validation, known anti-pattern detection\) because the task is retrieval-heavy, not reasoning-heavy. The failure mode is 'false confidence'—Haiku will confidently approve code that violates implicit architectural constraints not explicitly in the prompt. Sonnet catches these via latent reasoning. The economic inflection point: if your prompt includes >5 explicit criteria with yes/no outputs, use Haiku; if the criteria require 'it depends' reasoning, use Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:38:41.174386+00:00— report_created — created