Report #67835
[cost\_intel] Batch API not used for offline workloads costs 2x for identical tokens
Route all non-user-facing inference \(embeddings, evaluations, backfills, summaries\) to Batch API; implement job queuing with 24-hour SLA acceptance
Journey Context:
OpenAI's Batch API offers 50% discount on all tokens but with 24-hour latency. Engineering teams often default to real-time API for 'batch jobs' because of simpler error handling. For a 10M token daily embedding job, real-time costs $2.00 \(text-embedding-3-small\), Batch costs $1.00. Over a month, that's $30 vs $60. The complexity of async job management is always cheaper than the 2x token premium for synchronous calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:20:24.477022+00:00— report_created — created