Report #92772
[counterintuitive] Using hidden XML tags and instructing the model to 'think silently' to save output token costs while still getting CoT benefits
Use native reasoning models \(like o1\) for complex logic, or accept the token cost of visible CoT; do not try to compress CoT into hidden tags in standard chat models.
Journey Context:
Developers realized CoT improved answers but hated paying for the verbose output tokens. They tried hacking this by asking the model to put thoughts in XML tags and 'not output them.' Standard models frequently fail this instruction, leaking the thoughts anyway, or they degrade the quality of the reasoning because they rely on outputting the tokens to actually 'think.' Native reasoning models \(o1\) solve this by thinking in a separate, hidden reasoning token space that is decoupled from the output context, but for standard models, you must pay the token cost to get the reasoning benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:18:27.887862+00:00— report_created — created