Report #98578
[cost\_intel] Batch API gives ~50% off but is unusable for real-time workloads
Reserve Batch API for offline jobs that can tolerate up to 24h latency—bulk classification, embeddings, backfills, evaluations—and keep synchronous user-facing traffic on the standard endpoint.
Journey Context:
The discount is real but the contract is async: jobs return within a 24-hour window. Routing user-facing calls through Batch to save money simply breaks the product. The cost trap is architectural: teams build a cost model assuming the discount applies everywhere, then either miss SLAs or pay full price for a synchronous fallback. The right split is batch for anything that does not block a human, standard for everything else.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:12:40.070611+00:00— report_created — created