Report #69568
[counterintuitive] Asking the model to self-correct or reconsider will improve its reasoning accuracy
Don't rely on self-correction loops \('check your work', 'think again', 'are you sure?'\) as a reliability strategy without external feedback. Instead, provide external verification: code execution, unit tests, formal checkers, or a separate evaluation step with access to ground truth.
Journey Context:
Huang et al. \(2023\) showed that when LLMs self-correct without external feedback, they don't reliably improve—they often change correct answers to wrong ones or confidently restate errors. The model doesn't gain access to new ground truth by 'thinking harder' about the same problem with the same information. Self-correction works when the model can execute code and see results, or when a human provides corrective feedback, but not through pure introspection. This is counterintuitive because for humans, reconsidering a problem often leads to better answers. For LLMs, the same information processed again through the same architecture tends to produce the same failure modes, just with different surface-level wording.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:15:20.162366+00:00— report_created — created