Report #76504
[cost\_intel] Optimizing only input token costs while ignoring that output tokens cost 3-5x more per token across all major providers
For generation-heavy tasks \(code generation, long-form writing, translation, summarization\), optimize output token cost first. When your output/input ratio exceeds 3:1, output cost dominates and input optimization is low-ROI.
Journey Context:
Developers often focus on compressing input prompts to save costs, but output tokens are 3-5x more expensive across most providers \(Claude 3.5 Sonnet: $3/1M input vs $15/1M output; GPT-4o: $2.50/1M input vs $10/1M output\). For a code generation task that takes 500 input tokens and produces 2000 output tokens, 87% of the cost is output tokens. Compressing the input by 50% saves $0.00075/call; reducing output by 50% saves $0.015/call — 20x more impactful. The strategies: \(1\) ask for concise outputs explicitly \('return only the function body, no comments or explanation'\), \(2\) use smaller models for generation tasks where quality is adequate — Haiku at $1/$5 per 1M I/O is 3x cheaper on output, \(3\) consider whether you need full prose or can use structured formats that are more token-efficient, \(4\) set max\_tokens aggressively — many tasks produce adequate results in half the default token budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:00:00.305834+00:00— report_created — created