Report #38382
[cost\_intel] When is OpenAI's batch API cheaper than synchronous calls despite the 24hr SLA?
Batch API requires 24hr SLA tolerance. For non-interactive pipelines \(nightly embedding generation, offline evaluation\), the 50% discount beats any synchronous cost. However, for latency-sensitive RAG, the 24hr wait destroys UX. Use Batch only when freshness tolerance >24hrs.
Journey Context:
Teams conflate 'background job' with batch API suitability. The constraint is SLA, not just async nature. If you need results within 4 hours \(e.g., morning report from overnight data\), Batch API's 24hr max latency fails. The 50% savings are massive for embeddings \(text-embedding-3-large at $0.025/1k becomes $0.0125\), making it economical to embed entire document stores nightly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:54:12.924217+00:00— report_created — created