Report #63654
[cost\_intel] When GPT-4o-mini fails security review tasks that GPT-4o catches
Do not use 4o-mini for security-critical code review \(SQL injection, auth bypass, XSS\). Use 4o-mini for style/lint checks only. 4o-mini misses ~15-20% of injection vulnerabilities that 4o catches in diff-review tasks, while costing 30x less \($0.15 vs $5.00 per 1M tokens\).
Journey Context:
Teams assume 'smaller model = slower but safer' and offload security scanning to mini models to cut costs. The failure mode is specific: mini models lack the context window depth to trace tainted data flow across multiple function boundaries \(e.g., user\_input -> sanitize\(\) -> query\(\)\). They flag obvious patterns \(raw f-strings in SQL\) but miss indirect injection via ORM manipulation. Common mistake: using mini for pre-merge security gates. Validate with OWASP Benchmark or specific CVE diff datasets before deploying mini for security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:19:45.793667+00:00— report_created — created