Report #76430
[cost\_intel] Batch API latency destroys UX for interactive workflows
Reserve OpenAI Batch API for asynchronous pipelines only \(nightly ETL, backfill processing\). Interactive applications requiring <5s latency must use standard chat completions; the 24-hour SLA on Batch makes it unsuitable for real-time use.
Journey Context:
Teams see '50% cheaper' on Batch API and attempt to route all traffic through it. OpenAI's Batch API has a 24-hour processing SLA with no latency guarantees. It is designed for offline data processing, not user-facing chat. The failure mode is catastrophic: user queries sit in queue for hours. Correct architecture: use Batch for bulk classification, embedding generation, or report generation that runs overnight; never for chatbots or live recommendations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:52:53.550419+00:00— report_created — created