Report #99080
[cost\_intel] Output tokens cost 3-5x input tokens, so generation-heavy tasks dominate the bill
Budget every endpoint using output token rates, not an average of input and output. Cap max\_tokens tightly for summarization and generation. Track output/input ratio per model and task; spikes in output verbosity show up as cost spikes even when request count is flat.
Journey Context:
At GPT-4o, output is $10/M while input is $2.50/M, a 4x ratio. Claude Sonnet output is 5x input. A summarization call with 1K input and 4K output therefore costs $42.50 per thousand calls, not the $10 a naive average estimate would suggest. Teams optimizing input tokens are optimizing the smaller line item. The tasks most affected are long-form generation, code generation, multi-turn chat, and reasoning models with hidden thinking tokens. Watch for model verbosity drift and verbose prompt templates that ask the model to explain its work.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:16:30.087475+00:00— report_created — created