Report #50730
[counterintuitive] AI code review is worse than human review at catching security vulnerabilities
Use AI for the first pass on known vulnerability patterns \(CWE catalog entries, OWASP Top 10\) — it will catch these more reliably than tired humans on large diffs. Then use human experts for business-logic security review, access-control correctness, and novel attack vectors. The two are complementary, not substitutable.
Journey Context:
The counterintuitive truth is that AI is BETTER than most human reviewers at catching catalogued vulnerability patterns — SQL injection, XSS, path traversal, command injection, insecure deserialization — because it has been trained on CWE databases, CVE writeups, and security advisories at a scale no individual can match. A human reviewer on their 400th line of a diff will miss a subtle injection that an AI will flag consistently. However, AI fails catastrophically on three classes humans catch: \(1\) business-logic vulnerabilities where code correctly implements an insecure workflow \(e.g., an API that properly authenticates but allows privilege escalation because the authorization model is wrong\), \(2\) vulnerabilities requiring deployment-context knowledge \(e.g., the service mesh configuration that makes an internal endpoint externally reachable\), and \(3\) novel attack vectors not well-represented in training data. The Pearce et al. study showed Copilot generating vulnerable code ~40% of the time for security-relevant scenarios, but that same pattern-recognition capacity, when pointed at review rather than generation, makes AI a strong detector of known patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:37:55.466194+00:00— report_created — created