Report #35462
[cost\_intel] What token bloat patterns silently increase costs 10x in production prompts?
Avoid XML verbosity in prompts \(value vs JSON\), include CoT reasoning in output when only final answer is needed, and repeat static few-shot examples every turn instead of using system prompt. XML uses 3-5x tokens compared to compact JSON for the same data. For a 2k token prompt processed 1M times/month, switching JSON saves $60-80k on GPT-4o.
Journey Context:
Engineers use XML because it 'looks structured' to humans, but tokenizers \(BPE\) encode XML brackets and tags as separate tokens, whereas compact JSON keys are compressed. Example: John = 5 tokens; \{"name":"John"\} = 4 tokens; but at scale with nested structures, the gap widens. CoT bloat: asking model to 'explain reasoning then answer' doubles output tokens for classification tasks where only the label matters. Use logprobs or separate calls for reasoning vs answer. Few-shot bloat: repeating 5 examples \(500 tokens\) in every user message instead of putting them in the system message \(cached in Anthropic or just persisted in context window\) multiplies costs by turn count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:59:54.278404+00:00— report_created — created