Report #56030
[cost\_intel] Using OpenAI realtime API for high-volume embedding generation costs 2x vs batch mode
Switch to Batch API for >100k embeddings/day; latency tolerance >5 minutes yields 50% cost reduction \($0.10 vs $0.20 per 1M tokens for text-embedding-3-large\)
Journey Context:
Realtime embedding at scale creates unnecessary cost pressure. OpenAI Batch API offers 50% discount with 24hr SLA, but for embeddings specifically, the latency is typically 1-10 minutes, not 24 hours. At 1M requests/day, realtime costs $200 \(assuming 1k tokens avg\), batch costs $100. Critical: batching requires file upload/download overhead—worth it only above 10k requests/batch due to fixed API call costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:32:22.626050+00:00— report_created — created