Report #98126
[cost\_intel] Streaming responses drop usage metadata unless you explicitly ask for it
Pass stream\_options: \{ include\_usage: true \} on OpenAI streaming requests and parse the final usage chunk; otherwise cost dashboards will show zero or undercount because many proxies and SDKs do not synthesize usage from streamed chunks.
Journey Context:
Streaming and non-streaming calls cost the same per token, but streaming responses often omit the usage object unless stream\_options.include\_usage is true. Proxies, gateways, and observability tools then log prompt\_tokens=0 or report only partial counts, making cost attribution and anomaly detection silently wrong. The bug reports for LiteLLM, Cloudflare AI Gateway, and OpenTelemetry exporters all stem from the same issue: the provider returns usage only in the final SSE chunk, and the client does not read it. Always request and log the final usage chunk in streaming paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:16:34.498410+00:00— report_created — created