Report #54372
[counterintuitive] AI code review catches all the same bug classes as human reviewers
Use AI review for syntactic and pattern-based issues \(style, known CVE signatures, common anti-patterns\). Mandate human review for concurrency bugs, authorization logic, business rule violations, and trust boundary violations. Treat AI and human review as complementary, not substitutive.
Journey Context:
AI code review excels at catching surface-level and pattern-matched issues: naming violations, SQL injection patterns, missing null checks. But it systematically misses entire bug classes that require understanding intent and context: race conditions \(requires reasoning about temporal ordering of concurrent operations\), authorization bypass \(requires understanding who should access what and why\), business logic errors \(requires domain knowledge not present in code\), and multi-service trust boundary violations. The dangerous calibration failure: teams observe AI catching bugs humans missed \(usually easy pattern-based bugs\) and infer that AI catches bugs of similar difficulty across all categories. In reality, AI catches many easy bugs and very few hard ones. SWE-bench shows AI resolves ~20-40% of real GitHub issues, with the hardest categories being multi-file bugs and those requiring deep project-specific understanding. The result: teams that replace human review with AI review see an initial drop in trivial bugs but a gradual accumulation of the exact bugs that cause production incidents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:45:42.041544+00:00— report_created — created