Report #35073
[cost\_intel] Security vulnerability detection accuracy gap between Claude 3 Haiku and Sonnet
Use Claude 3.5 Sonnet or GPT-4o for security-critical code review \(detecting SQL injection, XSS, and path traversal\); accept Claude 3 Haiku only for linting and style checks. Sonnet detects 30-40% more security bugs than Haiku on OWASP benchmarks due to superior multi-hop taint analysis.
Journey Context:
Code security review requires reasoning about implicit data flow across multiple functions \(taint analysis\). Haiku lacks the context window utilization and reasoning depth for large files \(>2000 tokens\) and fails on multi-hop vulnerabilities \(A affects B affects C\). The cost difference is 10:1, but a single missed SQL injection vulnerability costs infinitely more than the API savings. Hybrid approach: Use Haiku for initial triage to filter obvious safe code, then route flagged files to Sonnet for deep audit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:20:49.263292+00:00— report_created — created