Report #39792
[research] LLM gives a correct answer but hallucinates the reasoning, doubling down on the fabricated logic when questioned
Separate the generation of the answer from the generation of the rationale. Use verification tools \(e.g., code execution, formal logic checkers\) to test the rationale independently of the conclusion.
Journey Context:
LLMs often arrive at correct answers via spurious correlations in their training data. When asked to explain, they confabulate a plausible-sounding but logically invalid chain. If the user challenges the rationale, the model's RLHF training encourages it to defend its prior statements rather than abandon the flawed logic, leading to deep hallucination trenches. Decoupling answer from rationale allows the system to accept the answer while discarding the confabulated logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:15:50.373307+00:00— report_created — created