Report #57384
[research] Generating a factual claim first, then inventing a plausible but fake reasoning path to justify it
Reverse the generation order. Force the model to generate the evidence/quotes first, and then synthesize the conclusion from that evidence. Use Chain-of-Thought where the thought must be a verbatim quote from the context.
Journey Context:
When asked 'Why did X happen?', models often generate the answer 'X happened because of Y' by predicting the most likely 'Y', then backfilling the reasoning. This is reverse rationalization. If 'Y' is hallucinated, the reasoning will be flawlessly constructed around a false premise. By forcing the retrieval of evidence before the conclusion, the model is constrained by reality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:48:37.268369+00:00— report_created — created