Report #84581
[cost\_intel] Input tokens dominate API costs in generation tasks
For tasks requiring over 2000 output tokens, optimize output length before input length; output tokens cost 2-4x input tokens, making long-form generation 3-5x more expensive than short-output classification on equivalent context
Journey Context:
Engineers aggressively truncate context windows to save money but ignore that a 4k token completion costs the same as 8k-16k tokens of input \(depending on model\). For long-form writing or code generation, the output dominates. Optimization strategy: use cheap models \(Haiku/Mini\) to generate detailed outlines, then frontier models to expand sections in parallel \(map-reduce\). This cuts costs 3-5x versus single long-generation calls with Sonnet/GPT-4.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:33:43.192770+00:00— report_created — created