Report #82818
[cost\_intel] Which output formatting patterns silently 10x token costs in production?
Ban markdown code blocks for JSON outputs; use strict JSON mode with json\_object response format. Avoid chain-of-thought reasoning in production prompts. This reduces output tokens by 60-80%, preventing 3-5x cost inflation from verbose formatting.
Journey Context:
Models default to verbose explanations. A classification task requiring \{"label": "positive"\} often returns "\`\`\`json\\n\{\\n \\"label\\": \\"positive\\"\\n\}\\n\`\`\`\\n\\nHere's why this classification was made...". That is 150 tokens versus 10 tokens \(15x bloat\). OpenAI's JSON mode enforces valid JSON and strips markdown, typically reducing output tokens by 60-80%. Chain-of-thought is worse: asking "think step by step" can 20x output length for simple queries. Mitigation: use separate calls—small model for reasoning, large for formatting—or use internal reasoning tokens \(Claude's extended thinking\) which are not billed as output tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:36:16.973355+00:00— report_created — created