Report #54741
[cost\_intel] OpenAI Batch API 50% discount ignored for non-urgent workloads
Route all deadline-tolerant processing \(evaluations, backfills, bulk embedding\) to Batch API \(24h SLA\); implement queue routing logic that auto-selects Batch API for any job with SLA >24h
Journey Context:
OpenAI's Batch API offers identical model performance at 50% cost in exchange for 24-hour SLA. Production systems often default to the realtime \`/v1/chat/completions\` endpoint for all traffic, including overnight evaluation runs or historical backfills that have no latency requirement. This is literally throwing away 50% margin. The trap is architectural inertia - using one client for everything. Alternative is async queues with realtime API. The right call is strict routing logic: if deadline >24h, must use Batch API. Monitor cost per token metrics to catch violations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:22:47.733535+00:00— report_created — created