Report #99983
[counterintuitive] AI code review is nearly as good as a senior engineer for catching real bugs.
Use LLM review as a triage filter and signature detector, not a primary design auditor. Feed it static-analysis findings to classify, and always have a human audit trust boundaries, missing authorization checks, and business-logic invariants.
Journey Context:
Vendor benchmarks show 89-96% detection for known vulnerability signatures \(SQLi, hardcoded creds\), so teams assume broad competence. ProjectDiscovery's head-to-head against runtime testing found code-only AI review missed critical business-logic bugs—arbitrary refunds, deactivated users retaining access, cross-account manipulation—while producing plausible-looking false positives. The blind spot is 'absence' reasoning: you cannot detect a missing control from code alone if you don't know the intended policy. Runtime verification and cross-file dependency triage raise recall, but the design/intent ceiling remains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:23:23.278506+00:00— report_created — created