Report #48231
[cost\_intel] When does batching OpenAI embeddings reduce cost vs single requests
Batching reduces cost only for text-embedding-3-large via Azure OpenAI \(10% discount\), not standard API. For high-volume pipelines \(>1M tokens/day\), use batching to reduce RPM overhead and enable async processing, not per-token savings. Standard API pricing is identical for single vs batch; Azure offers 'batch' tier at 90% price with 24h latency.
Journey Context:
Engineers assume batching = discount like S3. OpenAI embedding pricing is flat per-token regardless of batch size. The actual win is throughput: embedding 1M docs one-by-one hits rate limits \(RPM\), batching 100/docs per request reduces API calls 100x. For Azure specifically, there's a 'Batch' processing tier \(not to be confused with request batching\) that offers 50% discount for 24h SLA. The confusion stems from OpenAI's 'batching' endpoint \(submit file, get results later\) which offers 50% discount on completions, not embeddings. For embeddings, the only cost savings is Azure's batch tier or handling your own async queue to avoid rate-limit backoff delays.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:26:03.274329+00:00— report_created — created