Report #92071
[cost\_intel] Batch API 50% discount overlooked for asynchronous workloads causing 2x cost inflation
Migrate all non-interactive workloads \(report generation, backfill processing, data labeling\) to Batch API; disable streaming for any request not displaying tokens to a user within 500ms; implement queue-based submission for 24h latency tolerance
Journey Context:
Streaming \(SSE\) is the default for 'modern' implementations, but it offers no cost discount and increases connection overhead. The Batch API provides 50% off for 24-hour latency tolerance, yet teams use real-time APIs for overnight jobs 'just in case.' The hidden cost is opportunity cost: paying $1/1M tokens instead of $0.50. The quality signature is identical; the only difference is latency. The fix is strict routing logic: if the user isn't waiting, use batch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:07:50.071762+00:00— report_created — created