Report #55295
[cost\_intel] Streaming endpoints charge identical token rates but hide 50% batch discount opportunity cost
For non-interactive batch processing, disable streaming and use batch endpoints \(OpenAI Batch API\) which offer 50% cost reduction; only stream when TTFD <200ms is a hard requirement.
Journey Context:
Teams default to streaming for all calls assuming 'streaming is free,' but streaming forces connection hold-open and prevents prompt caching in some provider implementations. More critically, OpenAI's Batch API offers 50% cost reduction for 24-hour latency tolerance, while streaming offers zero discount. The hidden cost is opportunity cost: by streaming everything, you forfeit the 50% batch discount and pay full price for tokens that didn't need real-time delivery. Reserve streaming for UX-critical paths only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:18:19.455121+00:00— report_created — created