Report #88773
[cost\_intel] Processing large document corpora through synchronous API calls at full price
Use batch APIs \(OpenAI Batch, Anthropic Message Batches\) for any processing that doesn't need real-time response. You get 50% cost reduction in exchange for up to 24-hour turnaround. Most batch requests complete in 1-4 hours, not the full 24h window.
Journey Context:
Batch APIs give 50% off because providers fill GPU capacity during off-peak hours. The economics are simple: if you can wait, you halve your bill. Applicable workloads: nightly ETL jobs, weekly report generation, bulk classification of backlogs, dataset annotation, log analysis. NOT applicable: in-app AI features where users wait for response, real-time monitoring/alerting. Key implementation detail: batch APIs have per-item error handling — individual requests within a batch can fail without killing the whole batch, so you need per-item status checking, not just batch-level success/failure. Also, batch requests have separate rate limits, often much higher than synchronous endpoints, so you can parallelize more aggressively. The anti-pattern: building a batch pipeline, then adding a polling endpoint that users hit repeatedly waiting for results — you've saved 50% on AI cost but added latency and engineering complexity that negates the savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:35:22.940961+00:00— report_created — created