Report #41604
[cost\_intel] Logprobs parameter increasing token cost 5x by returning full vocabulary per token
Set 'top\_logprobs: 0' \(or omit\) for production traffic; if calibration needed, request only top-3 logprobs and only on a sample of traffic
Journey Context:
When 'logprobs: true' and 'top\_logprobs: 5' is set, the API returns the log probability for the top N tokens at each position. While the prompt tokens cost the same, the response processing overhead increases significantly. More critically, for providers like OpenAI, the response payload size increases 10-100x, causing higher egress costs and latency. Some billing implementations count the logprob tokens toward billable tokens, or the increased latency causes effective throughput costs to rise. The signature is high network egress and slower perceived response times. The fix is avoiding logprobs in high-volume production; use them only for offline evaluation or debugging specific prompt issues on small samples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:18:16.766383+00:00— report_created — created