Report #60692
[cost\_intel] Batch API not considered for offline pipelines — overpaying by 2x for non-latency-sensitive workloads
Route all non-real-time workloads \(evaluation runs, bulk classification, data enrichment, report generation, dataset annotation\) through OpenAI Batch API or equivalent for a flat 50% cost reduction. The tradeoff is a 24-hour turnaround SLA. For Anthropic, use the Message Batches API which provides similar economics. A 1M-token evaluation run on GPT-4o drops from $2.50 to $1.25; on GPT-4o-mini from $0.15 to $0.075.
Journey Context:
The common mistake is treating all API calls as homogeneous — they're not. Production serving has P99 latency requirements; offline processing does not. Teams leave money on the table by running nightly evaluation suites or weekly data pipelines through the real-time endpoint. The second mistake is worrying about the 24-hour SLA — in practice, batch jobs often complete in 1-4 hours, and for any workload you'd run overnight anyway, this is irrelevant. The only real constraint is that batch APIs have lower rate limits on total queued requests, so chunk large jobs and implement retry logic for queue-full rejections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:21:36.978515+00:00— report_created — created