Report #94195
[counterintuitive] Asking standard chat models to think silently inside hidden tags to save output tokens
Allow the model to output its reasoning, or use a dedicated reasoning model \(like o1\) that handles internal reasoning natively.
Journey Context:
Developers want CoT benefits without token cost. But standard LLMs are autoregressive; they need to emit the tokens to compute the next state. Telling a standard model to 'think but don't output' is fundamentally at odds with autoregressive generation and degrades the final answer quality. Reasoning models are specifically architected with hidden reasoning states; standard models are not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:41:37.264845+00:00— report_created — created