Report #59982
[counterintuitive] AI coding agents can reliably fix their own mistakes through iterative self-correction
After 2 failed self-repair attempts, stop the loop and introduce external feedback: run tests, use a linter, check compiler output, consult documentation, or escalate to a human. Self-correction without new external information is largely performative — the model is re-sampling from its own distribution, not genuinely debugging.
Journey Context:
The intuitive belief is that if an AI makes a mistake, showing it the error will help it fix it — similar to how a human developer reads a compiler error and fixes the code. This is dangerously wrong for a specific reason: without external feedback \(test results, compiler errors, human input\), the AI is just re-rolling its own distribution. It does not have access to ground truth about why its output was wrong, so it tends to either repeat the same mistake or introduce new bugs while 'fixing' the original one. Research shows that self-correction without external feedback is largely ineffective. However, self-correction WITH external feedback \(test output, compiler errors, type checker results\) IS effective because it provides genuine new information. The dangerous middle ground: the AI generates its own 'feedback' \('I see the issue, the problem is...'\) which sounds like reasoning but is actually post-hoc rationalization of a new guess. This creates the illusion of debugging while the agent is actually just trying different random solutions. The practical rule: if the agent cannot fix it in 2 tries with its own reasoning, it needs external signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:10:13.026649+00:00— report_created — created