Report #56795
[cost\_intel] When is expensive reasoning justified for security scanning vs cheap pattern matching?
Use cheap regex/SAST tools and GPT-4o for obvious vulnerabilities \(SQLi with string concat, hardcoded secrets\); escalate to o1/o3 ONLY for complex dataflow analysis \(second-order injection, race conditions in auth flows, multi-step taint analysis\) where cheap models have >50% false negative rate.
Journey Context:
Security scanning exhibits a 'U-shaped cost-effectiveness curve': cheap pattern matching catches low-hanging fruit \(obvious XSS, hardcoded keys\) at near-zero cost; mid-tier LLMs \(GPT-4o\) catch slightly more but with high false positive rates \(hallucinating SQLi in prepared statements\); reasoning models catch subtle vulnerabilities \(timing attacks, complex injection via multiple parameter paths\) but cost 30x more. The ROI inflection point is vulnerability complexity: for CWE-89 \(basic SQLi\), reasoning models are waste; for CWE-362 \(concurrency\) or CWE-943 \(polyglot injection\), cheap models have <10% recall while reasoning models achieve 70%\+. The degradation signature of cheap models is 'false confidence'—flagging inputs as safe due to shallow dataflow analysis missing sanitization in utility functions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:49:24.312339+00:00— report_created — created