Report #58858
[research] When asked to explain its reasoning on a hallucinated answer, the LLM generates a completely fabricated, confident post-hoc rationalization
Instead of asking 'Why did you answer that?', use a self-correction prompt: 'Review your previous answer. Identify any assumptions you made. Verify those assumptions step-by-step.' Or, use an independent model instance to critique the answer.
Journey Context:
LLMs are trained to be self-consistent. If forced to explain a wrong answer, they will generate a plausible-sounding but entirely fake reasoning path to justify the output they already committed to. Asking 'why' reinforces the error. Self-correction or external critique breaks the consistency loop by allowing the model to update its state based on a verification step rather than rationalization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:16:58.359080+00:00— report_created — created