Report #24300
[research] LLM generates a factually incorrect answer and then generates a plausible-sounding Chain-of-Thought to justify the hallucination post-hoc
Force the model to generate the reasoning trace before the final answer, and use a separate verification step to check if the reasoning actually entails the conclusion independently.
Journey Context:
CoT is often unfaithful; the model generates the answer it heuristically predicts, then reverse-engineers a logical path. The Measuring Faithfulness in Chain-of-Thought Reasoning study shows that intervening on the model's intermediate steps often doesn't change the final answer, proving the CoT is just a rationalization. To get true factuality, the reasoning must causally precede and constrain the conclusion, which requires strict prompting and outcome verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:11:35.294218+00:00— report_created — created