Report #78317
[cost\_intel] Processing 1M documents one-by-one via ChatGPT API, paying 50% premium on throughput-limited endpoints
Use OpenAI Batch API for embedding and completion jobs >1k requests; get 50% cost reduction and 2x higher rate limits vs synchronous API
Journey Context:
Synchronous APIs prioritize latency. For backfill jobs \(embedding archive, bulk classification\), latency doesn't matter. OpenAI's Batch API offers 50% discount and separate token pools. The failure mode is queue depth: if you need results in <24h, batching may be too slow. Common mistake: batching small jobs \(<100 requests\) where the 24h turnaround overhead dominates savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:02:59.652625+00:00— report_created — created