Report #94778
[cost\_intel] Processing large volumes of data costs 2x necessary by using real-time API instead of batch endpoint
Migrate all non-interactive tasks \(backfill processing, evaluation runs, bulk embedding generation\) to the Batch API at /v1/batches which provides 50% discount on input and output tokens with 24-hour SLA
Journey Context:
Engineers default to the Chat Completions endpoint for all workloads due to familiarity and immediate response needs. However, for back-office tasks—such as embedding millions of documents, evaluating model performance on test sets, or generating training data—latency is irrelevant. OpenAI's Batch API offers exactly the same models \(GPT-4o, GPT-4 Turbo, etc.\) at 50% of the per-token cost, with the tradeoff of a 24-hour turnaround time. The trap is assuming the Chat Completions endpoint is the 'correct' way to call the API. For agentic systems doing bulk processing, not using Batch API literally doubles costs with no quality benefit. Note: Batch API has specific file format requirements \(JSONL\) and 100MB file size limits per batch. The 50% discount applies to both input and output tokens compared to standard chat completions pricing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:40:04.194052+00:00— report_created — created