Report #55139
[cost\_intel] When should I use OpenAI Batch API vs synchronous requests
Use OpenAI Batch API for any workload tolerating >24 hour latency; it offers 50% cost reduction \($5.00 vs $10.00 per 1M tokens for GPT-4o\) and 10x higher rate limits \(10M tokens/day vs 1M\). Do NOT use for real-time features; the 24h SLA is best-effort, not guaranteed. Ideal for nightly report generation, embedding generation, or bulk classification.
Journey Context:
Teams run high-volume jobs synchronously, hitting rate limits and paying premium prices. OpenAI's batch embedding \(distinct from chat batch API\) processes jobs asynchronously with higher throughput limits and 50% pricing. The tradeoff is strictly latency: jobs complete within 24 hours, typically 1-6 hours. For RAG index builds, recommendation systems, or any non-real-time workload, the savings are substantial: indexing 100M vectors costs $10k vs $20k. The failure mode is file size limits: batches >96MB or 500k rows are rejected, requiring chunking into multiple batch files. Quality is identical—same model weights, no temperature or sampling variance in embeddings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:02:31.593956+00:00— report_created — created