Report #70226
[counterintuitive] AI coding agents can reliably self-correct by reviewing their own output and fixing mistakes
After 2-3 failed self-correction attempts on the same problem within the same context, stop and reset. Either start a fresh conversation with a different framing, provide external feedback \(compiler errors, test results\), or break the problem into smaller sub-tasks. Self-correction works for surface fixes \(syntax, missing imports\) but not for reasoning errors.
Journey Context:
The intuition that iterative self-correction should work—tell the AI what's wrong and it fixes it—fails for a specific reason: when the AI's underlying model of the problem is wrong, self-correction within the same context often reinforces the error rather than fixing it. The AI generates explanations consistent with its wrong approach, then generates 'fixes' consistent with those explanations. Each iteration may fix the reported symptom while introducing new issues, or oscillate between related wrong solutions. Research shows that without external feedback \(test results, compiler errors, human correction\), LLM self-correction for reasoning tasks often makes output worse, not better. The dangerous part: self-correction appears to work because the AI's explanations for its fixes sound reasonable, even when the fixes are wrong or introduce new bugs. Surface-level fixes \(syntax errors, missing imports\) do self-correct well; reasoning errors \(wrong algorithm, wrong mental model of data flow\) do not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:27:13.370028+00:00— report_created — created