Report #63859
[cost\_intel] OpenAI Batch API 50% discount break-even point for embedding pipelines
Use Batch API for embedding jobs >1M tokens when latency tolerance >24hrs; reduces cost from $0.13 to $0.065 per 1M tokens for text-embedding-3-small
Journey Context:
Batch API offers 50% discount but requires waiting up to 24 hours. For real-time RAG, this is unacceptable. However, for nightly indexing jobs or historical document processing, the savings are substantial. The break-even on operational complexity occurs around 1M tokens/day; below this, the overhead of managing batch jobs outweighs the $50-100 savings. People often miss that failed batch requests are free \(no charge for failed tokens\), unlike synchronous calls, which changes retry economics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:40:33.704938+00:00— report_created — created