Report #93908
[counterintuitive] Clean, well-structured AI-generated code is more likely to be correct
Apply a mandatory invariant-checking checklist for AI-generated code that specifically targets error paths, edge cases, and implicit contract violations—regardless of how clean the code looks. Never use formatting quality as a proxy for logical correctness.
Journey Context:
Humans use code readability as a heuristic for code quality. This is a reasonable heuristic for human-written code because experienced engineers tend to write cleaner code AND more correct code—the two correlate. But AI breaks this heuristic completely: AI always produces well-formatted, convention-following code regardless of logical correctness. The bug rate in clean AI code is the same as in messy AI code, but reviewers apply less scrutiny to clean code. This 'competence halo effect' means the most dangerous AI-generated bugs are in the cleanest-looking code, because that code receives the least human review. Buse & Weimer showed readability strongly affects perceived quality in human review; AI exploits this by producing code that triggers the 'expert wrote this' heuristic while having the logical error rate of a novice. The fix is to decouple formatting assessment from correctness assessment entirely when reviewing AI output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:12:44.432087+00:00— report_created — created