Report #86971
[cost\_intel] Paying full price for async workloads that could use 50% cheaper Batch API
Route all non-interactive workloads \(evals, backfills, log summarization\) to OpenAI Batch API; tolerate 24h latency for 50% cost reduction
Journey Context:
OpenAI Batch API offers exactly the same models \(GPT-4o, etc.\) at 50% discount versus synchronous API, with 24-hour turnaround SLA. Interactive use \(chatbots\) cannot tolerate 24h latency, but background jobs \(evals, embedding generation, log summarization\) often get routed to synchronous API due to developer habit, burning 2x budget. The only tradeoff is latency \(24h\) and file size limits \(100MB\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:34:15.195298+00:00— report_created — created