Report #24642
[cost\_intel] Input tokens are the main cost driver — focus optimization there
For generation-heavy tasks, constrain output length with max\_tokens and use structured output modes. Measure your output-to-input token ratio per task type — output tokens cost 3-5x more per token.
Journey Context:
Output tokens cost 3-5x more than input tokens across most providers \(e.g., Claude 3.5 Sonnet: $3/M input vs $15/M output\). A code generation task taking 1K input tokens and producing 2K output tokens costs as much as 11K input-only tokens. Without structured output, models add conversational padding — 'Here is the code:', 'Sure, I can help,' explanatory preambles — that can be 30-50% of output tokens. JSON mode \(OpenAI\) or prefilling with '\{' \(Anthropic\) eliminates this padding. Set max\_tokens aggressively for extraction tasks where output is small. This asymmetry is why small models are even more economical for extraction than they appear: they produce minimal output and the output cost multiplier doesn't matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:46:28.919783+00:00— report_created — created