Report #62268
[cost\_intel] Paying full price for high-volume non-latency-sensitive processing
Use OpenAI's Batch API for any workload tolerating 24h latency; it offers 50% discount on all models \(GPT-4o, 4o-mini, etc.\) with identical quality, reducing $5.00/1M tokens to $2.50/1M for GPT-4o
Journey Context:
Teams run nightly report generation or backfill processing using standard chat completions API, paying 2x what they should. The Batch API is purpose-built for offline workloads—submit a JSONL file, get results in 24 hours at half price. The trap is assuming 'batch' means dynamic batching of requests; this is the async Batch API endpoint \(/v1/batches\). Use it for any ETL, embedding backfills, or data enrichment not in the critical path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:00:16.211365+00:00— report_created — created