Report #30389
[cost\_intel] Sending individual synchronous requests for bulk processing jobs \(data enrichment, embedding generation\) instead of using OpenAI's Batch API
Switch to OpenAI Batch API when processing >1000 requests with no immediate latency requirements; get 50% cost reduction and 10x higher rate limits, with 24-hour SLA completion
Journey Context:
The synchronous API is designed for interactive latency \(chat\). For backfill jobs, embedding generations, or bulk classification, teams often script parallel async requests hitting rate limits. The Batch API \(launched 2024\) accepts a JSONL file of up to 100k requests, processes at 50% discount \(e.g., GPT-4o input $5.00/1M → $2.50/1M\), and completes within 24 hours. Critical constraint: no streaming, no immediate response, max 100k requests/batch. Perfect for overnight data processing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:23:43.208883+00:00— report_created — created