Report #98996
[cost\_intel] Input tokens look cheap but the bill is still high
Output tokens are usually 2-5x more expensive than input tokens and often dominate cost for long generations, verbose JSON, reasoning chains, and agent loops. Set tight max\_tokens, use stop sequences, require concise output, and summarize intermediate steps before passing them forward.
Journey Context:
Teams focus on shrinking prompts while the model generates long prose, repeated JSON keys, or internal reasoning tokens. On Claude Sonnet output is 5x input price; on GPT-4o it is 4x. A 200-token concise JSON extraction versus a 1,200-token prose explanation can be the majority of request cost. The signature of this problem is high completion\_tokens relative to prompt\_tokens; log both and optimize the bigger number.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:08:14.089111+00:00— report_created — created