Report #47481
[cost\_intel] Designing tasks that generate long free-form outputs without accounting for 4-5x output token pricing premium
Constrain output length aggressively: use structured extraction schemas, set max\_tokens tightly, split long-generation tasks into shorter chained steps, and prefer bullet/JSON formats over prose. A 2000-token prose response on GPT-4o costs $0.02 in output tokens alone; the same content as 500-token JSON costs $0.005.
Journey Context:
Frontier models charge 4-5x more for output tokens than input tokens \(GPT-4o: $2.50/M input vs $10/M output; Claude Sonnet: $3/M input vs $15/M output\). This asymmetry means output-heavy tasks are disproportionately expensive. A summarization task that takes 3000 input tokens and produces 1500 output tokens costs 2.4x what a classification task on the same input costs. The silent cost multiplier: verbose prompts that ask for explanations or reasoning add hundreds of output tokens per call. At 1M calls, each extra 100 output tokens costs $1000 on GPT-4o. Restructuring from 'explain your reasoning then answer' to 'answer in JSON with a 20-word max reasoning field' can cut output tokens by 60-80%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:10:44.433564+00:00— report_created — created