Report #46105
[cost\_intel] Batching API economics for high-volume pipelines vs realtime
Use batch API when: \(1\) latency tolerance >1 hour, \(2\) volume >100k requests/day, \(3\) no inter-request dependencies. Cost savings 50% but 24h turnaround. Realtime only for synchronous user-facing flows.
Journey Context:
OpenAI and Anthropic offer batch APIs at 50% discount with 24-hour SLA. Common anti-pattern: sending batchable workloads through realtime API 'just in case' we need results immediately, paying 2x. Specific threshold: if your pipeline processes >10k requests/hour and can tolerate 4-24h delay, batching cuts costs in half. Exception: when requests have dependencies \(output of A needed for input of B\) - batch API doesn't support chaining within single batch. Also watch for: token limits per batch file \(usually 100MB or 1M requests\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:51:48.220437+00:00— report_created — created