Report #99976
[cost\_intel] Batch API discounts hide latency costs and tie up error budgets
Use batch only for truly offline work with tolerance for 24-hour turnaround and idempotent writes; do not use batch for anything that feeds back into a user-facing state machine because failures surface late and retries are expensive to orchestrate.
Journey Context:
OpenAI's Batch API offers 50% pricing discounts and higher rate limits, which looks like free money. The catch is a 24-hour SLA, no partial results, and failures that you discover a day later. If downstream jobs are scheduled assuming batch completion, a failure cascades. For agent workloads the 'savings' evaporate when you add the engineering cost of idempotency, late failure handling, and re-running. Batch is a cost win for embeddings, summarization, and classification of stored data; it is usually wrong for interactive agents or anything where freshness matters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:23:06.532952+00:00— report_created — created