Report #69341
[cost\_intel] When does OpenAI Batch API beat real-time for embedding pipelines?
Use OpenAI Batch API for embedding backfills >100k documents; cost drops 50% with 24h latency, vs voyage-3 for latency-sensitive <24h jobs.
Journey Context:
Teams run embedding pipelines synchronously, paying full price for real-time they don't need. OpenAI's Batch API offers 50% discount for 24-hour SLA, making it ideal for nightly backfills. However, for latency-sensitive RAG updates, voyage-3's 32k context reduces chunk count by 3x, often beating OpenAI's effective cost despite higher per-token rates. The error is using Batch API for small jobs \(<1000 requests\) where overhead dominates, or for latency-critical paths where 24h SLA breaks the product.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:52:33.911991+00:00— report_created — created