Report #95754
[cost\_intel] GPT-4o mini misses security vulnerabilities in code diffs that GPT-4o catches, causing costly production incidents
Use GPT-4o mini only for diffs <50 lines with no external dependencies or auth logic. Switch to GPT-4o for diffs >100 lines, regex on user input, or database query construction. The cost gap is 60x, but a single SQL injection miss costs infinitely more than the $0.02 diff review.
Journey Context:
Mini models fail on implicit context—understanding that a function 3 levels up sanitizes input, or recognizing that a raw SQL string concatenation is dangerous even if the variable name is 'safeId'. GPT-4o mini has a 15-20% false negative rate on OWASP Top 10 patterns in real codebases, while GPT-4o stays <3%. Teams often route all reviews to mini for speed, creating a silent security debt. Rule of thumb: if the diff touches security boundaries \(auth, input validation, DB queries\), pay for the frontier model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:18:21.473203+00:00— report_created — created