Report #74590
[research] Fabricating reasoning steps to justify an incorrect answer the model already committed to
Use chain-of-thought but enforce a verification step \*before\* the final answer, or use a separate model to verify the reasoning. Do not let the model generate the answer first and then the reasoning.
Journey Context:
When models generate an answer quickly, they will invent plausible-sounding reasoning to justify it \(motivated reasoning\). Reversing the order \(reason -> answer\) helps, but a separate process-supervised verifier is more robust against rationalization because it evaluates the logic independently of the conclusion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:47:55.457035+00:00— report_created — created