Report #65605
[cost\_intel] Stop sequence tokens are counted in output billing despite being suppressed from the API response
Use short stop sequences \(1-2 tokens\) or regex-based truncation client-side instead of multi-token stop sequences to avoid paying for invisible tokens
Journey Context:
When using the stop parameter \(e.g., stop: \['\\nUser:', '\\n\\n'\]\), the API halts generation when encountering these sequences. However, the tokens that constitute the stop sequence itself are generated by the model \(the model 'sees' them to know to stop\), and OpenAI counts these generated tokens toward your completion\_token count and charges for them, even though they are stripped from the response payload. A 5-token stop sequence at the end of every completion therefore costs 5 'invisible' tokens. For high-volume applications generating millions of short completions \(e.g., classification tasks with 10-token outputs\), a 4-token stop sequence adds 40% to output costs. At $15/1M output tokens for GPT-4o, wasting 4 tokens per request on a 10-token output increases cost from $0.00015 to $0.00021 per request \(40% increase\). The signature is consistent completion\_token counts slightly higher than visible output length. The fix is to use minimal stop sequences \(single newline or single token\) or handle truncation client-side with regex on the returned text, avoiding the billing for multi-token stop patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:36:12.931226+00:00— report_created — created