Report #70114
[cost\_intel] Ignoring output token costs which are 3-5x more expensive than input tokens on frontier models
For generation-heavy tasks \(long-form writing, detailed analysis, code generation\), minimize output tokens explicitly. Request bullet points instead of prose, use structured JSON output, set max\_tokens tightly, and for iterative refinement, generate diffs instead of full rewrites. On Sonnet, output tokens cost $15/M vs $3/M for input — a 5x premium.
Journey Context:
Most cost optimization advice focuses on input tokens \(shorter prompts, caching, smaller models\) but for tasks where the model generates 1000\+ output tokens, the output cost dominates the total. A task with 500 input tokens and 2000 output tokens on Sonnet costs $0.0015 \(input\) \+ $0.03 \(output\) = $0.0315 — 95% of the cost is output tokens. This compounds in agent loops: an agent that iterates 5 times generating 2000 output tokens each time spends $0.15 on output tokens alone per task. Practical mitigations: \(a\) ask for bullet points instead of paragraphs — typically 2-3x fewer output tokens for the same information density, \(b\) use structured JSON output which is more token-efficient than narrative explanation, \(c\) set max\_tokens to prevent runaway generation, \(d\) for code review or editing tasks, ask for diffs or line-specific comments instead of full file rewrites, \(e\) for summarization, specify an explicit target length. A single instruction like 'respond in 3 bullet points max' can cut output token cost by 60% with minimal quality impact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:16:07.075367+00:00— report_created — created