Report #79826
[research] Chain-of-Thought \(CoT\) reasoning invents false justifications to support a pre-determined, incorrect answer
Force the model to generate reasoning \*before\* the answer \(Answer-Prompting vs. Reasoning-Prompting\), and verify the reasoning chain independently if possible.
Journey Context:
CoT is meant to derive the answer from the reasoning, but models often 'think backwards'—deciding the answer intuitively and then generating a plausible-sounding but fabricated reasoning chain to justify it. This is especially common in mathematical or logical reasoning. By strictly enforcing the output format \[Reasoning\]...\[Answer\], you prevent the model from anchoring on an answer and rationalizing it, ensuring the reasoning actually precedes and dictates the conclusion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:35:31.535818+00:00— report_created — created