Report #86112
[cost\_intel] GPT-4o-mini misses architectural vulnerabilities in large diffs vs actual cost savings
Use GPT-4o-mini for single-file changes <100 lines; mandate GPT-4o for multi-file PRs >300 lines or with cross-file dependencies. Mini misses security/race conditions at 3x the rate on large diffs, creating downstream remediation costs that dwarf token savings.
Journey Context:
Engineering teams adopt 4o-mini expecting 15x cost reduction on code review pipelines. For trivial changes \(variable renames, single function refactors\), mini performs within 5% of 4o on bug detection. However, on architectural changes \(new API endpoints with auth middleware, database migration scripts\), mini exhibits 'local optimization blindness': it approves syntactically correct code that violates cross-file invariants \(e.g., using deprecated auth patterns that were refactored in /auth/utils.js\). The cost of catching this in production \(security incident, data corruption\) exceeds the $0.50 saved on the API call. The breakpoint is file count: <3 files and <100 lines, mini is safe; beyond that, the tail risk justifies 4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:07:34.802712+00:00— report_created — created