Report #42885
[cost\_intel] Output-heavy generation tasks costing unexpectedly high even on cheap models
For tasks requiring long outputs \(e.g., document generation, translation\), optimize the model choice for output token pricing, as output tokens are typically 3-5x more expensive than input tokens.
Journey Context:
People look at the blended 'per token' price or input price and assume Haiku is cheap. But output tokens are the bottleneck. If a task requires 2000 output tokens, the output cost dominates the bill. Sometimes a model with slightly higher input cost but much lower output cost \(or faster generation speed allowing higher throughput on provisioned TPM\) is better. Always calculate the actual cost per request based on expected input/output ratio, not the advertised input token floor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:26:59.751471+00:00— report_created — created