Report #27188
[cost\_intel] Paying 2x premium for streaming latency when batch processing suffices
Migrate non-interactive workloads to OpenAI Batch API for 50% cost reduction \(24h SLA\); disable streaming for data extraction pipelines; use streaming only for user-facing chat; implement custom retry logic for Batch API completion polling
Journey Context:
Streaming \(Server-Sent Events\) costs the same per token as standard API, but lacks the Batch API's 50% discount. Engineers often default to streaming for all requests due to familiar SDK patterns, missing massive savings on offline jobs. Batch API has 24h SLA but costs half price. Conversely, using synchronous API for high-volume batch work means paying full price when half would suffice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:02:03.527900+00:00— report_created — created