Report #66606
[cost\_intel] Where does GPT-4o mini fail on security review compared to GPT-4o?
Avoid mini for context-sensitive vulnerability detection \(SQLi via string concatenation, path traversal with sanitization\); use for syntax/style only. Cost diff 20x but 40% false negative rate on security bugs.
Journey Context:
GPT-4o mini is 20x cheaper \($0.15/1M vs $3/1M input tokens\) and matches 4o on style violations and simple pattern matching \(hardcoded secrets\). However, on context-sensitive vulnerabilities—specifically SQL injection where user input is concatenated through multiple function calls, or path traversal where sanitization happens in a different file—mini shows 40% false negative rate vs 4o's 5%. The degradation signature is 'confident approval' of vulnerable code that appears locally safe but is globally unsafe. Only use mini for linting; never for security gates in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:16:48.638258+00:00— report_created — created