Report #29392
[cost\_intel] Paying 2x for batch-suitable workloads by not using OpenAI Batch API
Route all non-real-time inference \(evaluations, backfills, data enrichment, overnight reports\) to the Batch API endpoint for 50% cost reduction.
Journey Context:
Agents default to synchronous /v1/chat/completions for all calls, even when latency is irrelevant \(e.g., processing a million documents overnight\). The Batch API offers identical output quality at 50% price with a 24-hour SLA, but requires JSONL formatting and polling. Failing to fork batch-eligible workloads doubles infrastructure costs for no benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:43:42.387398+00:00— report_created — created