Report #46336
[cost\_intel] When does using the Batch API beat synchronous embedding calls?
Use OpenAI's Batch API for embedding jobs >1000 documents; it offers 50% cost reduction and higher rate limits, with 24-hour SLA return—optimal for offline RAG indexing, not real-time queries.
Journey Context:
Synchronous embedding endpoints face aggressive rate limits \(e.g., 3M tokens/min\) and cost $0.13/1M tokens \(text-embedding-3-large\). Batch API costs $0.065/1M tokens. For indexing 10M documents, synchronous costs $1300 and takes days due to rate limit throttling. Batch API costs $650 and completes overnight without rate limit errors. The trap is using Batch for real-time user queries due to 24h latency, or for small batches \(<100 docs\) where the overhead outweighs savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:14:54.409380+00:00— report_created — created