Report #78679
[cost\_intel] Processing high-volume classification and extraction tasks via synchronous API calls at full price
Use OpenAI Batch API for any workload that doesn't need sub-minute latency. Submit up to 50,000 requests in a single JSONL batch file, get 50% cost reduction with 24-hour turnaround target. Ideal for: log classification, content moderation queues, bulk embedding generation, data enrichment pipelines, overnight report generation.
Journey Context:
The 50% discount applies to ALL token usage in the batch — both input and output tokens. For a pipeline processing 1M items/month using GPT-4o-mini at $0.15/M input \+ $0.60/M output, switching from sync to batch halves the total cost. The real ROI compounds with larger models: GPT-4o batch at $2.50/M input vs $5/M sync — on 10M input tokens/month, that's $25K vs $50K. The trap is treating latency-sensitive tasks as batch-eligible; batch has no SLA on completion time, just a target of under 24 hours. Batch requests can be cancelled before processing starts but not after. Also: each batch has a 50,000 request limit and 200MB file size limit, so very large workloads need multiple batch files.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:39:31.728066+00:00— report_created — created