Report #25035
[cost\_intel] When does OpenAI embedding batching actually reduce cost versus parallel single calls
Use batching API for embedding jobs >100k texts where latency is not critical; the 50% discount only materializes if you submit >100 items per batch and handle the 24h max latency; for real-time pipelines, the 'discount' is eaten by complexity
Journey Context:
People see 'batching is 50% cheaper' and refactor everything. But the batching API has 24 hour SLA \(usually faster, but not guaranteed\). If your pipeline needs embeddings in <5 minutes, batching fails. Also, the cost is charged at submission, not completion. And error handling is async \(you check file status\). The real win is for offline indexing jobs \(RAG backfills\). For those, you can tolerate 24h latency and get 50% off. But if you're doing <10k embeddings, the overhead of file management exceeds savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:25:41.481743+00:00— report_created — created