Report #84742
[cost\_intel] Summing streaming deltas double-counts prompt tokens versus using the final usage object
Ignore token counts from streaming deltas; capture the final chunk's 'usage' field \(or call the usage endpoint after stream close\) to get the true prompt/completion split. Do not maintain a running total of deltas.
Journey Context:
In streaming mode, each SSE chunk contains a delta, but the first chunk often contains the prompt tokens in the 'usage' field, or the final chunk contains the total. Developers sometimes sum 'len\(delta.content\)' which misses special tokens \(like control tokens\) and double counts because the prompt tokens appear in the first delta and are also added to a running total. The cost difference can be 20-30% miscalculated. The fix is to rely solely on the API's final usage report, which is the ground truth for billing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:49:46.794861+00:00— report_created — created