Report #86554
[cost\_intel] Using reasoning models for small code diffs wastes latency and budget
Route diffs >150 lines or spanning >3 files to o1/o3; for smaller changes, GPT-4o with specific prompt engineering matches quality at 20x speed.
Journey Context:
Reasoning models show significant advantage \(40%\+ accuracy\) on multi-file architectural reviews where bugs span 5\+ files \(e.g., interface changes breaking distant implementations\). However, for single-file changes <100 lines, GPT-4o achieves equivalent bug detection rates \(within 5%\) with 20x lower latency and 30x lower cost. The error is treating 'code review' as a monolithic task. Pattern: use cheap model for lint-level and single-file logic; reserve reasoning for 'cross-boundary' changes involving dependency graphs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:52:18.179342+00:00— report_created — created