Report #79254
[cost\_intel] Batch API economics for high-volume embedding pipelines
Use OpenAI Batch API for embedding jobs >100k chunks; it offers 50% discount \($0.005/1M vs $0.01/1M for text-embedding-3-small\) and removes rate limits, though adds 24-hour latency.
Journey Context:
Engineers stream embeddings synchronously for 'real-time' ingestion, paying full price and hitting 3,000 RPM limits. For bulk backfills or nightly indexing, the Batch API is strictly better: half price, no TPM/RPM limits, automatic retries. The tradeoff is 24-hour turnaround. At 10M chunks/day, synchronous costs $100 \(small\) or $1300 \(large\); batch costs $50/$650. Only use synchronous for user-facing latency requirements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:37:16.464086+00:00— report_created — created