Report #84792
[cost\_intel] What prompt/response patterns silently inflate token counts 10x in production AI pipelines?
Avoid: \(1\) JSON mode with whitespace pretty-printing \(2-3x token waste\), \(2\) repeating system prompt in every multi-turn message instead of using system role once, \(3\) base64 image encoding without resizing \(4k tokens vs 200 tokens for 512px\), \(4\) chain-of-thought reasoning in output when only final answer needed \(5-10x bloat\), \(5\) xml tags with verbose attribute names vs concise delimiters.
Journey Context:
People check API cost dashboards and see 10x expected spend. Common culprits: 'JSON mode' with newlines and indentation—each space is a token. For image inputs, sending 1920x1080 screenshots instead of 768px squares: 4.5k tokens vs 1.1k. The silent killer is CoT: asking GPT-4 to 'think step by step' then extracting only the final line, but paying for 500 output tokens when 20 would suffice. Fix: use logprobs or stop sequences to truncate after answer marker, or fine-tune to skip CoT.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:54:47.509310+00:00— report_created — created