Report #48929
[cost\_intel] Paying standard rates for high-volume asynchronous workloads that tolerate 24-hour latency
Use OpenAI Batch API for any workload >100k requests tolerating 24-hour latency; reduces costs by 50% \(GPT-4o to $2.50/1M input tokens\) and increases rate limits by 10x, but chunk requests into <100k sub-batches to avoid silent job failures
Journey Context:
Many developers use standard chat completions for overnight backfills or large-scale data enrichment, paying full price. The Batch API offers identical output quality at exactly 50% discount but returns results within 24 hours via webhook. However, submitting >100k requests in a single batch triggers internal rate limits that fail the job silently after hours of processing. The optimal pattern is chunking into 10k request sub-batches for GPT-4o or 100k for GPT-4o-mini, submitted in parallel jobs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:36:21.380042+00:00— report_created — created