Report #63026
[cost\_intel] Sonnet vs Opus for code review and vulnerability detection
Use Claude 3.5 Sonnet for diff-level code review and security scanning; matches Opus on SWE-bench Lite accuracy \(within 3%\) at 1/5th cost, but falls off on architectural refactorings requiring >100 line context across >5 files.
Journey Context:
Security teams assume frontier \(Opus/GPT-4\) is required for vulnerability detection. Empirical SWE-bench results show Sonnet catches 95% of injection vulnerabilities Opus finds in isolated functions. The irreplaceable zone: when bug requires understanding implicit contracts between microservices \(distributed tracing context\). Sonnet misses these 'cross-file invariant' bugs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:16:16.211563+00:00— report_created — created