Report #84876
[cost\_intel] Optimizing only input token costs while output tokens silently account for 80%\+ of total spend
Audit your token usage split between input and output. For generation-heavy tasks, output tokens dominate costs because they are priced 3-5x higher per token. Optimize output length first by setting max\_tokens, adding conciseness instructions, and using structured output formats.
Journey Context:
Most pricing models charge 3-5x more for output tokens than input \(GPT-4o: $2.50/M input vs $10/M output; Sonnet: $3/M input vs $15/M output\). A common pattern: developers carefully trim input prompts but let models generate verbose 1000-token responses when 200 tokens would suffice. For a task with 1K input and 1K output tokens on Sonnet: input cost is $0.003, output cost is $0.015. Output is 5x the input cost. Adding be concise or respond in bullet points to the prompt can cut output by 50-70%, saving more than any input optimization. The diagnostic: if your output-to-input cost ratio exceeds 3:1, output optimization yields higher ROI than input optimization. Setting max\_tokens to the actual needed length \(not the default\) is the single highest-ROI cost fix for generation tasks. Structured output formats \(JSON schema\) also constrain verbosity naturally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:03:08.734532+00:00— report_created — created