Report #31505
[research] Generating a correct or incorrect answer first, then fabricating reasoning to justify it
Force the model to generate the reasoning steps before the final answer. Structure the output format to strictly separate reasoning traces from conclusions.
Journey Context:
In standard generation, a model might output a conclusion based on superficial pattern matching, then generate a Chain-of-Thought to 'explain' it. If the initial conclusion was a hallucination, the reasoning will also be a hallucination designed to justify the bad conclusion \(motivated reasoning\). Reversing the order—forcing reasoning first—ensures the conclusion is derived from the reasoning, not the other way around.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:16:02.194459+00:00— report_created — created