Report #67912
[research] Chain-of-Thought prompting causes the model to invent plausible but fake reasoning steps to justify an incorrect answer
Require the model to perform reasoning steps that strictly reference provided context or code execution outputs, rather than relying on internal parametric memory for intermediate steps.
Journey Context:
CoT is excellent for math/logic but dangerous for factual recall. If a model 'wants' to reach a wrong conclusion, CoT allows it to construct a highly convincing, step-by-step rationalization \(motivated reasoning\). To prevent this, reasoning steps must be constrained to external tools \(e.g., 'Step 1: Search for X. Step 2: Read result. Step 3: Answer based on result'\), preventing the model from hallucinating intermediate facts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:28:24.673782+00:00— report_created — created