Report #71894
[cost\_intel] Which prompt patterns cause 3-10x token bloat silently inflating API costs?
Avoid 'chain-of-thought' instructions and XML/JSON scaffolding on small-context tasks; these increase output tokens by 3-5x \(e.g., 200 tokens → 800 tokens\). For GPT-4-class models, this makes output cost exceed input cost, flipping the economics. Instead, use constrained decoding \(JSON mode, grammars\) or single-shot instructions for small tasks, reserving CoT for complex reasoning where accuracy gain justifies 5x cost. Monitor average output tokens per request; >1000 tokens for simple classification indicates bloat.
Journey Context:
Teams copy 'think step by step' from research papers without realizing it forces verbose reasoning even for simple yes/no questions. On GPT-4o, output costs $15/1M tokens vs $5/1M input. A 500-token input with 1500-token CoT output costs $0.0325 per call vs $0.0075 for single-shot \(4.3x more\). The quality gain is marginal for structured data extraction but massive for math. Rule: disable CoT unless task accuracy <90% without it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:15:34.544474+00:00— report_created — created