Report #95555
[cost\_intel] Unexpected 2-5x token charges when using logprobs or echo parameters for debugging
Avoid logprobs in production; use echo only with max\_tokens=0 for prompt token counting; sample logprobs for only top-5 tokens rather than top-20
Journey Context:
OpenAI's API offers 'logprobs' and 'echo' parameters for debugging and token probability analysis. 'logprobs' returns the log probability of each output token and optionally the top-N alternative tokens. 'echo' returns the prompt tokens back in the response. Both parameters trigger hidden cost mechanisms: \(1\) Logprobs increases backend compute and often results in higher 'billed tokens' because the API includes the top-N logprob candidates in the token count calculation for billing purposes, even though they aren't part of the final output. \(2\) Echo causes the API to re-tokenize and bill for the prompt tokens again in the output, effectively doubling the prompt token cost when combined with generation. For example, a 1k prompt with 500 output tokens normally costs 1k input \+ 500 output. With echo=True and logprobs=20, it might bill 1k input \+ 1k echoed prompt \+ 500 output \+ 500 logprob tokens = 3k total. The trap is assuming these debugging parameters are free. The fix is to only use echo with max\_tokens=0 for prompt validation \(costs 0 for generation\), and limit logprobs to top-5 or disable entirely in production workloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:58:02.410027+00:00— report_created — created