Report #84857
[cost\_intel] Using synchronous API for non-time-sensitive bulk processing at 2x the cost
Route any processing that tolerates up to 24-hour latency through OpenAI Batch API for 50% cost reduction. Ideal for nightly evals, dataset labeling, backfill, and bulk classification.
Journey Context:
OpenAI Batch API provides 50% cost reduction with up to 24-hour turnaround and no rate limits. For a pipeline processing 1M classification requests/day with GPT-4o: synchronous cost at $5/M input\+output mix is roughly $5000/day. Batch: $2500/day. Annual savings: ~$900K. The key insight: most real-time requirements are artificial. User-facing features need sync, but backfill, evaluation, labeling, and reporting can almost always tolerate hours of delay. Common mistake: not architecting for async from the start, making batch migration require significant refactoring. Start by identifying any job that runs on a cron schedule as a batch candidate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:01:12.037143+00:00— report_created — created