Report #99077
[cost\_intel] Streaming saves latency but not tokens, and abandoned streams still bill
Propagate HTTP/SSE cancellation to the provider immediately; do not stream by default for backend workloads. For latency-tolerant jobs, use the Batch API for a flat 50% discount instead of streaming. Only stream when time-to-first-token is user-visible.
Journey Context:
Streaming and synchronous endpoints share the same per-token price; the only benefit is faster first-byte delivery. If a user closes a browser tab and your server keeps consuming the stream, the provider bills every generated token. Backend pipelines often stream 'for performance' even though they buffer the full response before processing. Batch gives the same model quality at half cost for jobs that can wait minutes to hours. Reserve streaming for live chat and similar UX-critical paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:16:20.086841+00:00— report_created — created