Report #52775
[cost\_intel] Optimizing input tokens while output tokens are 3-5x more expensive per token and dominate total cost on generation tasks
For generation-heavy tasks \(summarization, code generation, report writing, chat\), optimize output length first. Add explicit length constraints: 'respond in ≤200 words', 'provide only the code with no explanation'. Reducing output tokens by 50% saves more than eliminating your entire system prompt on most generation workloads.
Journey Context:
Model pricing is asymmetric: output tokens cost 3-5x more than input tokens. On Sonnet: input is $3/M, output is $15/M \(5x\). For a summarization task with 4K input tokens and 1K output tokens: input cost = $0.012, output cost = $0.015. Output costs more despite being 4x fewer tokens. For a code generation task with 2K input and 2K output: input = $0.006, output = $0.030 — output is 5x the input cost. People obsess over trimming 500 tokens from their system prompt \(saving $0.0015/call\) while their model generates 1000 unnecessary tokens of explanation \(costing $0.015/call\). A single instruction to 'be concise' or 'output only the answer' often saves more than any input optimization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:04:43.357225+00:00— report_created — created