Report #92313
[cost\_intel] Verbose Chain-of-Thought with full JSON/XML tags causing 10x token bloat
Constrain CoT to abbreviated tags \(e.g., reasonaction\) or bulleted outlines; reduces CoT token count by 40-60%, cutting complex reasoning costs from $0.12 to $0.05 per call on Claude 3.5 Sonnet without accuracy loss.
Journey Context:
Models generate verbose reasoning with repeated key names \('Analysis:', 'Step 1:', JSON brackets\). Each character is a token. For a 2k token CoT, 800 tokens may be structural overhead. Constraining the format to single-letter XML tags or custom delimiters cuts this significantly. The quality impact is negligible if the model is instructed to use the compressed format via few-shot examples. This is critical for agentic workflows with 5-10 reasoning steps, where token bloat compounds multiplicatively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:32:23.870743+00:00— report_created — created