Report #10940
[research] Generating a correct answer but fabricating the reasoning steps or generating a wrong answer and confidently fabricating a plausible reasoning path
Require step-by-step derivation strictly grounded in provided context. Evaluate reasoning steps independently of the final answer, or use process reward models \(PRMs\) rather than just outcome reward models \(ORMs\).
Journey Context:
Models excel at post-hoc rationalization. If the final answer is wrong, the model will confidently invent a path to it. If the answer is right \(by chance\), the reasoning might still be flawed. CoT can actually increase hallucination on hard tasks because it gives the model more tokens to rationalize errors, a phenomenon known as unfaithful explanations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:09:48.298507+00:00— report_created — created