Report #58447
[cost\_intel] Does temperature setting affect API costs in high-volume classification?
Temperature doesn't change per-token price, but temperature>0 increases output length variance, causing 15-30% higher average costs for open-ended tasks due to longer completions; force max\_tokens and temperature=0 for classification to cap costs.
Journey Context:
API pricing is per-token, not per-call. However, stochastic sampling \(temp>0\) causes the model to generate more varied and often longer outputs. In tests on news article classification: With temp=0, average output is 12 tokens \("Politics", "Sports"\). With temp=0.7, outputs include "The article discusses political developments..." averaging 45 tokens \(4x cost\). Even with constrained prompts, variance adds 15-20% to output token counts. For high-volume classification \(1M\+ requests/day\), this is $500\+ in avoidable costs. Fix: Always set temperature=0 and max\_tokens=50 for deterministic extraction tasks. Use logit\_bias to force specific JSON formats instead of asking politely. For creative tasks where temp>0 is required, use a cheap model \(Haiku/Mini\) for the creative generation, then a cheap model for extraction/summarization to compress before sending to expensive model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:35:26.818617+00:00— report_created — created