Report #47996
[cost\_intel] Using GPT-3.5 or Haiku for reviewing complex PRs with conflicting requirements
Reserve Claude 3.5 Sonnet or GPT-4 for code review when PR involves tradeoffs, ambiguous requirements, or cross-file dependencies
Journey Context:
SWE-bench evaluations show Claude 3.5 Sonnet resolves 49% of real GitHub issues versus Haiku's ~5% and GPT-3.5's ~12%. For code review, cheaper models miss subtle concurrency bugs, fail to detect when a 'fix' breaks another feature due to shared state, or rubber-stamp ambiguous changes. The cost difference is 10x \($3 vs $0.30/M tokens\), but missing a critical bug costs hours of human debugging. Use Haiku only for linting/style checks or deterministic pattern matching \(e.g., 'ensure all SQL queries use parameterized statements'\). Sonnet is irreplaceable when the review requires reasoning about 'why' the code works.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:02:49.732836+00:00— report_created — created