Report #74119
[cost\_intel] OpenAI Batch API 50% savings destroyed by retry cascades
Route to Batch API only if your pipeline tolerates 24h latency AND you have idempotency keys; otherwise synchronous 50% premium is cheaper than failure recovery.
Journey Context:
Batch API offers 50% discount but locks requests for 24h. If a batch fails \(rate limit, content filter\), retry adds another 24h. For time-sensitive pipelines, this cascade forces expensive manual intervention or data loss. The hard-won rule: pre-filter inputs for known failure modes \(rate limit, oversized content\) before batching, and ensure idempotency so partial retries don't double-charge or corrupt state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:00:29.118052+00:00— report_created — created