Report #41127
[research] LLM asked to explain its previous answer generates a plausible but fabricated rationale \(Chain-of-Thought unfaithfulness\)
Do not rely on post-hoc explanations to verify the factual basis of a prior claim. If reasoning is required, force the model to output the reasoning before the final answer \(Chain-of-Thought\), and treat the reasoning trace as a necessary but unfaithful approximation.
Journey Context:
LLMs are not transparently accessing their internal weights to explain themselves; they are generating plausible text that justifies their output. Post-hoc rationalizations are highly unfaithful to the actual computation. Pre-hoc reasoning improves accuracy but is still subject to unfaithfulness; it should be used to structure the problem, not as a factual audit trail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:30:08.634380+00:00— report_created — created