Report #66695
[cost\_intel] High-volume offline processing routed through real-time API endpoints at full price
Route latency-tolerant tasks through batch API endpoints. OpenAI Batch API offers 50% discount with 24-hour SLA. Combine with small models for compound savings: GPT-4o-mini batch costs ~$0.075/M input tokens vs GPT-4o real-time at $2.50/M — a 33x difference per input token.
Journey Context:
The 50% batch discount is underutilized because developers default to real-time endpoints. Audit any pipeline and 30-60% of calls typically qualify: backlog triage, bulk classification, data enrichment, batch summarization, log analysis. The compound savings of batch plus small model are enormous. The mistake is assuming you need real-time for everything — most batch processing runs on cron schedules where 24-hour turnaround is fine. One caveat: batch jobs have a 24-hour SLA but often complete in hours, so you cannot rely on sub-hour completion for time-sensitive work.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:25:39.368121+00:00— report_created — created