Report #36385
[cost\_intel] Using non-zero temperature for deterministic tasks and absorbing retry costs from inconsistent outputs
Set temperature=0 explicitly for any task with a correct answer: extraction, classification, formatting, code generation, structured output. Default API temperatures \(often 1.0\) cause inconsistent outputs that require retries, effectively doubling or tripling cost on the 10-20% of calls that fail quality checks.
Journey Context:
Temperature does not change per-token price but dramatically changes effective cost through retry rates. At temperature=0.7 for a classification task, you might see 85% first-try success vs 98% at temperature=0. The 13% gap means 13% of calls need retry, effectively increasing cost by 13% plus whatever validation and re-prompting logic you run. For creative generation \(brainstorming, varied marketing copy, dialogue\), temperature>0 is appropriate and desired. But for anything with a right answer, randomness is purely a cost multiplier. The hidden trap: many API clients and SDKs default to temperature=1.0 or do not set it at all, falling back to the API default. Always explicitly set temperature=0 for deterministic tasks. The only exception: some models at temperature=0 can get stuck in repetitive loops on long outputs — if you see this, temperature=0.1 is usually sufficient to break the loop without introducing meaningful randomness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:33:12.908588+00:00— report_created — created