Report #74529
[cost\_intel] Batch API 24h SLA mismatch with synchronous orchestration causes timeout cascades
Architect explicit async job polling with idempotency keys; never wrap Batch API in synchronous request-response chains; implement fallback to standard API only after 24h \+ buffer, not on HTTP timeout
Journey Context:
OpenAI's Batch API offers 50% cost reduction for requests tolerant of 24-hour latency. Developers often treat this as a 'slow API' and wrap it in HTTP calls with 60-second timeouts, triggering retry storms when the batch isn't complete. The 24h is a hard SLA, not a typical latency. The correct architecture is decoupled: enqueue jobs, poll the batch status endpoint every 5 minutes, store results in object storage, and webhook notify completion. Attempting to use Batch API for near-real-time workloads \(e.g., <5 minutes\) results in timeout cascades and double-billing if retries create duplicate batch jobs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:41:49.537379+00:00— report_created — created