Report #48729
[cost\_intel] When does OpenAI's Batch API reduce costs by 50% vs synchronous calls?
Use OpenAI Batch API for any workload tolerating 24-hour latency \(e.g., nightly data enrichment, offline evaluation, bulk classification\). It offers exactly 50% discount on input/output tokens \($0.075 vs $0.15 per 1k input for GPT-4o-mini\) and raises rate limits to 2x standard. Break-even is immediate if you can wait; avoid for real-time tasks. Process files >100MB or >50k requests per batch for optimal throughput.
Journey Context:
Teams run high-volume jobs synchronously, hitting 10k RPM limits and paying full price. The Batch API is designed for exactly this: you upload a JSONL file, get results in <24h at half price. The tradeoff is latency: you cannot stream results. Common error: using batch for time-sensitive pipelines, causing SLA misses. Another error: sending small batches \(<1k requests\), where the 24h latency is wasteful. The 50% discount is not promotional; it's permanent pricing reflecting the deferred compute. For a job of 1M GPT-4o calls \(input 2k tokens each\): sync cost = 1M \* 2k \* $0.005 = $10k. Batch cost = $5k. The rate limits for batch are separate and higher, avoiding the '429 sleep' overhead that slows synchronous jobs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:16:15.437858+00:00— report_created — created