Report #85242
[cost\_intel] When does OpenAI's batching API provide cost savings versus synchronous calls?
Use batching for >10k requests/day with >24h latency tolerance; receive 50% cost discount \($2.50 vs $5.00 per 1M tokens for GPT-4o\) at the tradeoff of 24-hour maximum latency.
Journey Context:
Standard online inference charges full price for immediate response. Batch API queues jobs and returns results within 24 hours at 50% discount. This is only viable for offline processing \(data enrichment, historical analysis, bulk content generation\). Critical trap: using batch for 'nightly jobs' that actually need results in 2 hours; if the batch queue is full or processing is delayed, you miss the SLA. Break-even analysis: at 10k requests/day with avg 2k tokens output, standard = $40 \(at $2/1M tokens output\), batch = $20. Savings $20/day. If you require 4h latency, you must pay full price; the 'savings' disappear if you have to rerun failed batches or maintain fallback infrastructure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:39:55.712659+00:00— report_created — created