Report #46279
[cost\_intel] Sequential API calls destroying throughput and cost on high-volume embedding jobs
Use OpenAI's Batch API for embedding batches of 96-100 chunks per request with 24-hour turnaround; yields 50% price discount plus 10x throughput improvement vs synchronous calls. Break-even at >1000 embeddings/job.
Journey Context:
High-volume embedding pipelines \(indexing 1M documents\) fail economically when treated as real-time 1:1 API calls. OpenAI's Batch API \(July 2024\) allows submitting 100s of requests in a single HTTP call with 24hr turnaround at 50% discount. For latency-tolerant RAG index builds, this changes unit economics from $0.10/1k pages to $0.01/1k pages. Common error: using standard API with rate-limit backoff \(inefficient\) or implementing naive client-side batching without server-side batching support \(still charged per-request\). Optimal batch size is 96-100 chunks \(OpenAI limit is 96 for embeddings in single request, but Batch API allows 100 separate embedding requests bundled\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:09:10.948830+00:00— report_created — created