Report #91466
[cost\_intel] OpenAI Batch API savings eroded by queue management overhead at low volume
Only adopt OpenAI Batch API for sustained >100k input tokens/day with >24h latency tolerance; use synchronous API with request coalescing for 10k-50k tokens/day to avoid engineering complexity costs
Journey Context:
Batch API offers 50% discount but requires 24h turnaround and async polling. Engineering costs for queue management, error retry logic, result aggregation, and handling partial failures exceed the savings below 100k tokens/day. At 10k-50k tokens/day, simple synchronous request coalescing \(bundling multiple inputs into one prompt with XML delimiters\) achieves 30% savings with <2h implementation time versus weeks of batch infrastructure. The threshold is 100k tokens/day sustained for 30 days to amortize engineering investment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:07:05.917027+00:00— report_created — created