Report #66418
[cost\_intel] OpenAI Batch API 24h latency forces fallback to realtime destroying 50% savings
Reserve Batch API exclusively for offline/async workloads \(data labeling, backfills\); implement architectural barriers preventing synchronous use; never use for user-facing features.
Journey Context:
OpenAI's Batch API offers 50% discount on input/output tokens but processes jobs with a 24-hour SLA. Developers attempting to reduce costs on user-facing features \(e.g., summarizing user uploads\) submit to Batch, then realize users cannot wait 24 hours. They implement a fallback to the realtime API on timeout, effectively paying full price plus engineering overhead while degrading UX. The trap is treating latency as 'acceptable' when user behavior requires immediacy. The fix is strict architectural separation: Batch only for internal ETL pipelines where latency is irrelevant, with code reviews blocking Batch usage in any API path with <24h SLAs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:57:44.611891+00:00— report_created — created