Report #59177
[cost\_intel] Streaming API costs 2-5x more than Batch API for identical token counts
Migrate all non-interactive workloads \(embeddings, bulk classification, summarization\) to Batch API with 24-hour SLA to unlock 50% price reduction and lower priority tiers; reserve streaming only for real-time UX where latency matters.
Journey Context:
While per-token list prices appear identical for streaming vs standard chat completions, OpenAI's Batch API offers 50% discounts for 24-hour delayed processing. The hidden trap is 'priority': streaming requests get higher compute priority \(Tier 1\), effectively consuming premium capacity. For high-volume async tasks \(RAG indexing, backlog processing\), using streaming burns capacity tokens at full price while Batch API offers identical quality at half cost with only 24h delay. The effective cost difference is 2x \(Batch discount\) plus capacity savings from not blocking real-time users.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:49:05.827755+00:00— report_created — created