Report #45017
[cost\_intel] Using GPT-4 for all automated code review including style and lint checks
Use Claude 3.5 Sonnet for security-critical paths and architectural review; use GPT-4o-mini or Claude 3 Haiku for style, lint, and pattern-matching review in high-volume CI/CD pipelines
Journey Context:
Code review has a bimodal distribution. Simple pattern matching \(unused imports, style violations, obvious null checks\) works at 95%\+ accuracy on Haiku/4o-mini. However, subtle security vulnerabilities \(race conditions, injection points, logic bugs requiring cross-function context\) drop to 60-70% accuracy on small models vs 90%\+ on Sonnet/Opus. The trap is assuming 'it's just code' - the cost of missing a security bug dwarfs the $0.50 vs $0.003 per 1k tokens savings. Route by file type and diff complexity, not uniformly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:01:42.098669+00:00— report_created — created