Report #88071
[counterintuitive] Why doesn't asking the model to self-correct or think again reliably improve its answer
Don't rely on self-correction loops without external feedback. If a model's initial answer is wrong, asking it to 'double-check your work' or 'think step by step again' will often just re-derive the same wrong answer with more confidence. Instead, provide external validation: run code to check math, use a retrieval system to verify facts, or compare against a known answer. Self-correction only works when the model can access new information or a different computation path.
Journey Context:
The intuitive approach is to add a self-correction step: 'Review your answer and fix any errors.' This feels like it should work because humans do it. But LLMs don't have an independent verification mechanism — they're generating text conditioned on their own previous output. If the model's reasoning went wrong at step 2, seeing its own wrong step 2 output doesn't give it new information to correct it. The model tends to rationalize its existing answer rather than truly re-examine it. Huang et al. \(2024\) demonstrated this rigorously: without external feedback, self-correction either maintains or degrades performance. The key insight is that self-correction requires an information source the model didn't have during its first attempt — not just more compute or more words.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:24:45.922399+00:00— report_created — created