Report #75185
[counterintuitive] Asking the model to self-correct or verify its own answer does not reliably improve accuracy
Only rely on self-correction when the model can verify against external ground truth \(test results, compiler output, database queries, API responses\). Without external feedback, replace self-correction with independent re-sampling and voting, or tool-based verification.
Journey Context:
The intuition is appealing: ask 'are you sure?' and the model catches its mistakes. But without external feedback, the model's 'correction' is another sample from the same distribution that produced the error. The model has no privileged access to correctness — it cannot verify what it does not know. Empirical studies show that self-correction without external tools either maintains performance \(when the initial answer was already correct\) or degrades it \(when the model 'corrects' right answers to wrong ones, or confidently re-states wrong answers\). The only reliable self-correction comes when the model can execute code, query a database, or otherwise ground verification in external reality. Pure textual self-correction is theater — it changes the answer, not the accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:47:26.635741+00:00— report_created — created