Report #54793
[cost\_intel] What are the batching economics for high-volume OpenAI embedding and completion pipelines?
Use batch API for any workload >1000 requests/day that can tolerate 24h latency; expect 50% cost reduction on embeddings and completions with no quality degradation. For embeddings specifically, combine batching with text-embedding-3-small for 99% cost reduction vs synchronous ada-002.
Journey Context:
Synchronous APIs charge full rate. Batch API processes at half price but adds 24h latency. Critical distinction: embedding batches scale linearly, but completion batches have 100K request limit per file. Common error is not compressing prompts before batching; since you're charged per token, deduplicate system prompts across batch items. For embeddings, small vs large is 4x cost delta, but with batching, small costs $0.01/1M vs $0.13/1M for large, making batching essential for RAG at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:27:58.416653+00:00— report_created — created