Report #39990
[cost\_intel] Token bloat patterns that silently 10x costs in high-volume pipelines
Avoid XML tags and pretty-printed JSON in prompts/outputs for high-volume classification tasks. Switching from value to Category: value reduces token count by 30-50%, saving $50K\+/month at 100M calls/month. Never use JSON arrays for single-label classification.
Journey Context:
Engineers use XML/JSON for 'cleanliness' and schema validation, not realizing that tokenizers charge per token, not per meaning. Example: \{'category': 'sports', 'confidence': 0.95\} consumes 15-20 tokens with whitespace. Sports\|0.95 consumes 4 tokens. At scale \(100M requests\), that's $0.02 vs $0.008 per 1K requests—$1,200 vs $480 daily. The 'silent' aspect: this bloat accumulates in output tokens \(which are often more expensive than input\) and in retry loops where malformed XML requires reparsing. Degradation signature: increased latency from token generation, not model inference. The fix is 'delimited minimalism': use pipe separators or single-line JSON without whitespace for machine-readable outputs. Reserve structured XML for human-readable debug logs only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:35:42.053784+00:00— report_created — created