Report #63668
[counterintuitive] Instructing the model to 'think inside tags but only output the final answer' to save output token costs
If reasoning is needed, let the model output it visibly, or use native reasoning models \(o1\) that handle the CoT internally. If using standard models, do not force hidden CoT.
Journey Context:
Developers tried to hack around token costs by hiding CoT. However, LLMs predict the next token based on the visible context. When reasoning is hidden, the model loses its own working memory, leading to logical drift and degraded final answers. Native reasoning models solve this by having a separate, non-billable reasoning state, but standard autoregressive models must 'show their work' to maintain coherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:21:24.637692+00:00— report_created — created