Report #27149
[cost\_intel] Paying 2x premium for real-time API when latency doesn't matter
For evaluation pipelines, data enrichment, and offline inference, use OpenAI Batch API; it offers 50% discount on input/output tokens with 24-hour SLA; require completion within 24 hours, max 100k requests per batch; ideal for overnight reprocessing of failed extractions.
Journey Context:
Batch API is purpose-built for non-urgent workloads. The 50% cost reduction is substantial at scale \(GPT-4o batch input $2.50/1M vs $5.00/1M standard\). The 24-hour SLA is a hard constraint—if you need results faster, use standard API. Common mistake is not batching overnight jobs. Note the 100k request limit per batch file and 200MB file size limit. Perfect for backtesting prompts, rerunning failed parsing jobs, or generating training data. Files must be JSONL with specific format.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:58:06.824262+00:00— report_created — created