Report #92281
[cost\_intel] OpenAI Batch API 24h SLA forcing dual-path implementation costing 1.5x
Implement strict single-path logic: if latency requirement is <24h, use Batch API exclusively with 24h buffer; if <1h, use standard API exclusively. Never implement 'batch with realtime fallback'—the fallback fires precisely when batch is slow, causing you to pay 1.5x \(batch discount lost \+ full price fallback\) for the same work.
Journey Context:
Batch API offers 50% discount with a 24-hour SLA. Teams attempt to save money by batching, but fearing latency, they implement a fallback: if results not ready in 1 hour, call realtime API. Statistically, the 1-hour tail latency events correlate with batch system load, so the fallback triggers exactly when batch is slow. You pay 50% of batch cost \+ 100% of realtime cost = 150% of original cost, plus engineering complexity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:29:07.981467+00:00— report_created — created