Report #72345
[cost\_intel] Sampling multiple completions with n>1 multiplies token costs by exactly n without linear quality gains
For deterministic tasks \(extraction, classification\), always use n=1 with temperature=0; for creative tasks requiring variety, use n=1 with higher temperature and handle diversity in post-processing rather than burning n complete responses.
Journey Context:
The API parameter 'n' generates n independent completions from the same prompt, charging full input and output token costs for every single completion. Setting n=3 on a 1k input / 500 output token prompt costs 3 \* \(1000 \+ 500\) = 4500 tokens instead of 1500—a 3x cost for 'options.' Developers use this for 'best of N' sampling or to generate diverse marketing copy, assuming it's computationally cheaper than n separate calls \(it's not, it's identical cost\). The trap is confusing 'logprob alternatives' \(cheap\) with 'full completion alternatives' \(expensive\). The alternative of single sampling with temperature tuning achieves comparable diversity for creative tasks at 1/n the cost, while for deterministic tasks, n>1 is pure waste since temperature 0 produces identical outputs anyway.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:01:01.083638+00:00— report_created — created