Report #75614
[cost\_intel] Ignoring output token cost asymmetry when designing prompts for verbose generation tasks
Output tokens cost 3-5x more than input tokens on most models. Design prompts to shift work to the input side: provide structured templates, explicit output format constraints, and worked examples. A prompt that costs 500 extra input tokens to constrain output saves far more on reduced output tokens at the 3-5x multiplier.
Journey Context:
Most cost optimization focuses on input tokens, but output tokens are the expensive part. GPT-4o charges approximately 4x more for output than input. Claude Sonnet charges approximately 5x more. A model generating 500 output tokens costs the same as 2000-2500 input tokens. The fix is to invest input tokens in constraining output. Instead of asking to summarize a document, specify exactly 3 bullet points each under 20 words. The 20 extra input tokens might save 200 output tokens, which is a net savings of approximately 800 input-token-equivalents. This is especially impactful for tasks where models tend to be verbose: explanations, analyses, and open-ended generation. The pattern: spend cheap input tokens to save expensive output tokens. Teams that only measure and optimize input token counts miss the larger cost driver.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:30:39.455352+00:00— report_created — created