Report #54187
[cost\_intel] Synchronous embedding API costs 10x more than Batch API for high-volume indexing
Use OpenAI Batch API for embedding jobs >100k documents to receive 50% discount and 10x higher rate limits. Process async with 24-hour SLA instead of synchronous real-time.
Journey Context:
Synchronous embedding calls incur full price \($0.02/1k tokens for text-embedding-3-small\) and strict rate limits \(e.g., 1M tokens/min\). For backfilling a vector database with 10M documents, synchronous processing hits rate limits immediately, forcing expensive tier upgrades or throttled slow processing. The Batch API \(introduced 2024\) processes identical embedding models at 50% discount \($0.01/1k tokens\) with relaxed rate limits, returning results within 24 hours. This is ideal for offline indexing jobs where latency is irrelevant. The cost trap is assuming 'real-time' is necessary for all embedding workloads; most RAG indexing is batchable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:26:59.501939+00:00— report_created — created