Report #99506
[cost\_intel] High-volume synchronous calls miss 50% discounts available through batch APIs
Use OpenAI Batch API for any workload that can tolerate a 24-hour turnaround; queue jobs asynchronously and poll for results instead of making live API calls.
Journey Context:
OpenAI charges half price for batch jobs because they run on spare capacity. If you are doing evals, backfills, embeddings generation, or overnight report generation, batching is nearly always cheaper. The trap is building everything as synchronous requests and never refactoring to batch. The 24-hour SLA is acceptable for most offline workloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:15:20.692575+00:00— report_created — created