Report #46509
[cost\_intel] Allowing verbose chain-of-thought reasoning on every request
For production classification or extraction using CoT, force the model to output reasoning in compressed structured format \(JSON with 10-word limit per step\) rather than natural language; this reduces output tokens by 60-80% while maintaining accuracy, cutting costs by $0.015-$0.04 per query on Sonnet-class models.
Journey Context:
Developers use CoT prompting \('think step by step'\) to improve accuracy. The model outputs: 'Let me think... First, I need to consider X. Looking at the text, I see Y...' This verbosity improves accuracy 5-15% but costs 3-5x more in output tokens. For high-volume APIs, this is prohibitive. The fix: structured CoT. Instead of 'think step by step,' use: 'Analyze in 3 steps. Output JSON: \{\\"step1\\": \\"<10 words>\\", \\"step2\\": \\"<10 words>\\", \\"final\_answer\\": \\"\\"\}'. This constrains verbosity while preserving reasoning trace. Tests on classification show accuracy drops <2% but token count drops from 400 to 80. At $15/1M output tokens \(Sonnet\), that's $0.006 vs $0.0012 per call. Scale to 1M calls: $6k vs $1.2k savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:32:14.845981+00:00— report_created — created