Report #52847
[research] LLM generates a factually incorrect step in a Chain-of-Thought but still arrives at the right answer, or rationalizes a wrong answer with a fake CoT
Evaluate the factual accuracy of the intermediate reasoning steps, not just the final answer. Use step-by-step verification models \(Process Reward Models\) rather than outcome-based scoring.
Journey Context:
CoT improves reasoning but also improves the model's ability to rationalize. If the model guesses the wrong answer, it will confidently generate a fake CoT to justify it \(post-hoc rationalization\). Outcome-based reward models \(ORMs\) miss this because they only check the final result. Process reward models \(PRMs\) score each step, penalizing hallucinated intermediate logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:12:08.150755+00:00— report_created — created