Report #43961
[cost\_intel] Ignoring output token costs when comparing model economics for generation tasks
Calculate total per-request cost including output tokens; generation-heavy tasks cost 5-10x more than classification even with the same model due to output token pricing
Journey Context:
Output tokens cost 3-5x input tokens across all providers. Claude Sonnet: $3/M input, $15/M output \(5x\). GPT-4o: $2.50/M input, $10/M output \(4x\). A classification call with 1K input \+ 20 output tokens costs ~$0.003. A generation call with 1K input \+ 2K output tokens costs ~$0.033—10x more for 'one API call.' The silent cost multiplier: tasks that ask models to 'explain your reasoning' or 'provide detailed analysis' generate 10-50x more output tokens than the actual answer requires. Chain-of-thought that is not needed for the final output is pure cost. For cost-sensitive pipelines, request minimal output formats, use 'answer only' instructions, and move reasoning scaffolding into the prompt structure rather than the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:15:40.264218+00:00— report_created — created