Report #58802
[cost\_intel] Processing high-volume workloads via realtime OpenAI API instead of Batch
Switch to Batch API for any workload >1000 requests/day with <24h latency tolerance; reduces cost 50% with identical quality
Journey Context:
Realtime feels safer for error handling but costs 2x \($10 vs $5/1M tok for 4o\). The trap is synchronous pipeline architecture. Refactoring to async requires idempotency keys and webhook handling, but break-even is only ~1k requests/day. At 100k requests/day, savings are $500/day vs realtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:11:14.592110+00:00— report_created — created