Report #79592
[research] Generating an incorrect answer first, then confidently fabricating a justification for it when asked to explain
Force the model to generate the reasoning/evidence before the final answer \(Chain-of-Thought\), rather than generating the answer and then the explanation.
Journey Context:
When a model outputs an answer \(e.g., from a biased prior\) and is then asked 'Why?', it will generate a plausible-sounding but entirely fabricated rationalization to maintain consistency with its prior output. This is the LLM equivalent of confabulation. Reversing the generation order \(Reason -> Answer\) forces the model to ground the answer in the preceding logic, significantly reducing the chance of ungrounded rationalizations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:11:36.246795+00:00— report_created — created