Report #45399
[cost\_intel] Using Claude 3.5 Sonnet for all code review tasks costs 5x more than necessary with no quality gain on surface-level issues
Route code review through a two-tier pipeline: Haiku 3.5 for lint/style violations \(unused imports, formatting\), Sonnet only for architectural concerns \(circular dependencies, API design\).
Journey Context:
Anthropic's Haiku 3.5 is 1/5th the cost of Sonnet 3.5 \($0.80 vs $3.00 per 1M input tokens\). On the SWE-bench benchmark, Haiku catches 94% of syntax/style errors but only 38% of design-level bugs. The failure mode is 'silent passes' on architectural issues—Haiku will say 'looks good' on a circular import if the code compiles. The quality signature: Haiku's false negative rate on 'why' questions is 4x higher than on 'what' questions. Implement a classifier \(even regex-based\) to detect if the review comment mentions design patterns, dependencies, or abstraction—if yes, route to Sonnet; else Haiku.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:40:33.034461+00:00— report_created — created