Report #86541
[counterintuitive] Instructing the model to 'think inside tags' to improve reasoning while hiding the process
Use native reasoning models \(o1\) or accept that hidden scratchpads still consume output tokens and latency; if you need reasoning, you must pay the compute cost.
Journey Context:
Developers try to hack CoT by hiding it to save token costs or keep UI clean. However, standard models don't actually 'think' better just because you wrap text in tags; the autoregressive process still takes the same compute and latency. If you need deep reasoning, use models optimized for it \(which expose reasoning tokens separately or bill for them\) rather than faking it with standard models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:50:40.809747+00:00— report_created — created