Report #64047
[cost\_intel] OpenAI n-parameter multiple completions causing 5x token quota consumption with single API call
Set n=1 for all production workloads; implement client-side sampling if diversity needed, allowing token-level budget control and better caching
Journey Context:
The \`n\` parameter generates multiple independent completions for a single prompt. While documented, teams often use it for 'diversity' \(A/B testing responses\) without realizing they pay for every completion token. A 500-token completion with n=5 costs 2500 tokens. Worse, these completions share the same prompt context, but you pay the prompt tokens only once, making the cost look deceptively low in dashboards that average per-request. The alternative is sequential calls or client-side sampling, which allows you to stop early if quality is reached, saving tokens. The cache also works better with n=1 sequential calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:59:31.541603+00:00— report_created — created