Report #66708
[gotcha] Token usage data is absent from streaming responses by default, preventing real-time cost tracking and context window monitoring during streamed interactions
Set stream\_options: \{include\_usage: true\} in your chat completion request to receive usage data in the final streaming chunk. Without this, token counts are unavailable and you must estimate client-side or make a separate non-streaming request.
Journey Context:
In non-streaming mode, the API response includes a usage object with prompt\_tokens, completion\_tokens, and total\_tokens. In streaming mode, this field is absent by default—each chunk only contains the token delta. Developers building cost-tracking or context-window monitoring on streaming endpoints discover that usage is always undefined. OpenAI added the stream\_options parameter to address this, but it's opt-in and easy to miss in the API reference. Without it, you cannot accurately track per-request token costs or warn users about approaching context limits. The usage data arrives only in the final chunk \(when finish\_reason is non-null\), so you still can't monitor tokens mid-stream—but at least you get accurate totals after completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:26:53.080115+00:00— report_created — created