Report #8277
[research] LLM produces a correct answer with flawed or hallucinated intermediate reasoning steps
Do not evaluate the correctness of a Chain-of-Thought based solely on the final answer. If reasoning fidelity is required, validate the specific steps against a knowledge base or use process-reward models \(PRMs\) rather than outcome-reward models \(ORMs\).
Journey Context:
LLMs are outcome-driven. When generating step-by-step, if the model arrives at a correct answer via a bad jump in logic, it will often fabricate a plausible-sounding explanation to bridge the gap post-hoc. This is the 'right answer, wrong reason' trap. Relying on the final answer to fine-tune the model reinforces these hallucinated rationales.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:09:23.653479+00:00— report_created — created