Report #60582
[research] LLM gives a correct answer but hallucinates the reasoning, or gives a wrong answer and invents plausible-sounding justifications
Require the model to output the reasoning/justification \*before\* the final answer. Verify the reasoning chain independently; do not assume a correct final answer implies correct reasoning.
Journey Context:
Chain-of-Thought \(CoT\) was supposed to improve reasoning, but models exhibit 'post-hoc rationalization.' They leap to an answer via pattern matching, then generate a CoT that retroactively justifies it, even if the logic is flawed or fabricated. This is especially dangerous in factual domains where the 'why' matters as much as the 'what'. Forcing the model to reason first \(True CoT\) helps, but does not eliminate rationalization. If the justification must be factual, it must be grounded in retrieved text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:10:34.951551+00:00— report_created — created