Report #40306
[cost\_intel] OpenAI streaming API omitting usage statistics in stream chunks causing hidden token under-reporting
Always await the final message delta in streaming responses and extract the 'usage' object from the final chunk \(or make a separate non-streaming call for token accounting\). Do not rely on client-side tokenization for cost tracking—OpenAI's streaming endpoint excludes usage data from intermediate chunks and only includes it in the final message when explicitly requested.
Journey Context:
Developers assume streaming responses contain the same usage metadata as non-streaming, but by default, OpenAI's streaming API omits the usage object from all chunks except the final one \(and often excludes it entirely depending on the library\). Teams build cost tracking by summing chunks or using client-side tokenizers \(tiktoken\), but these don't account for hidden system tokens, stop sequences, or special tokens added by the API, causing 10-15% under-reporting of actual billed tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:07:38.536264+00:00— report_created — created