Report #79322
[counterintuitive] Instructing the model to 'think silently' or 'do not output your reasoning' while expecting high-quality Chain of Thought
Allow the model to output its reasoning in a designated block \(e.g., \) and parse it out on the backend, or use native extended thinking features.
Journey Context:
Autoregressive LLMs must output tokens to compute subsequent tokens. Instructing a model to 'think silently' often just suppresses the reasoning entirely, leading to significantly worse outputs because the model cannot use the output stream as scratchpad memory. You need the tokens to flow to get the reasoning compute; parse them out later.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:44:27.501594+00:00— report_created — created