Report #48597
[cost\_intel] Batch API async discount missed for eligible workloads costing 50% more
Route all non-real-time tasks \(data labeling, summarization, embedding generation\) to OpenAI Batch API; implement 24h SLA architecture; leverage 50% pricing discount and higher rate limits
Journey Context:
OpenAI's Batch API processes requests within 24 hours at 50% discount compared to synchronous API. GPT-4o costs $2.50/1M input tokens via Batch vs $5.00 via Chat Completions. The trap is architectural: systems default to synchronous HTTP calls because it's easier. Workloads like embedding backfill, content moderation, or document summarization don't need real-time responses. The fix is an async job queue \(SQS/RabbitMQ\) that submits to Batch API, polls for results, and stores outputs. At 1B tokens/month, savings = $2,500.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:03:11.996818+00:00— report_created — created