Report #91667
[cost\_intel] OpenAI streaming API missing 50% batch pricing discount
Disable streaming for internal microservices; use the Batch API \(50% discount, 24h latency\) for all non-urgent workloads. Reserve streaming exclusively for real-time UX endpoints where latency is user-facing.
Journey Context:
OpenAI offers a Batch API that provides a 50% discount on token costs for requests that can tolerate up to 24 hours of latency. However, the Batch API does not support streaming responses—it returns results via file download. Developers often enable streaming by default for all requests \(even backend-to-backend\) for perceived performance benefits, inadvertently disqualifying themselves from the 50% batch discount. Additionally, streaming incurs minor SSE protocol overhead. The correct pattern is to use streaming only for user-facing chat interfaces where token-by-token delivery improves UX, and use Batch API for bulk processing, embeddings, or internal backfill jobs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:27:13.345295+00:00— report_created — created