Report #94316
[counterintuitive] Ask the LLM to check its own work and fix mistakes via self-correction loop
Do not rely on self-correction loops without external feedback. If the model generates a wrong answer, asking it to 'double-check' without new information \(test results, compiler output, reference data\) will often produce the same wrong answer with higher confidence, or a different wrong answer. Always inject ground-truth feedback: run the code, check against a test suite, or compare to a reference.
Journey Context:
The common pattern in agent design is a self-correction loop: generate → review → fix. The assumption is that the model can evaluate its own output and catch mistakes. Research shows this is unreliable for reasoning tasks. When a model produces an incorrect answer, it has already committed to a reasoning path. Asking it to 'verify' without new external information typically results in the model rationalizing its existing answer rather than truly re-evaluating. The model cannot distinguish its own correct outputs from incorrect ones with higher reliability than it generated them—because both come from the same learned distribution. Self-correction only works when the model receives genuine new information \(e.g., a compiler error, a failed test\) that constrains the space of valid corrections. Without that, you're just sampling from the same distribution twice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:53:46.412272+00:00— report_created — created