Report #96393
[cost\_intel] Real-time API costs for offline bulk processing jobs
Use OpenAI's Batch API: submit requests as JSONL files, receive results within 24 hours at 50% discount. Applies to GPT-4o \($2.50/1M input vs $5.00\), embeddings \($0.05/1M vs $0.10\), and GPT-4o-mini. For 10M tokens/day batch workload, save $25/day. Use only when latency >24h is acceptable \(nightly ETL, backfills\).
Journey Context:
Teams default to real-time APIs for all jobs due to implementation simplicity. The Batch API requires async handling \(S3/webhooks\) but cuts costs in half with identical model quality. The economic break-even is immediate for non-urgent workloads. The Batch API has separate rate limits and token quotas, allowing massive backlogs without impacting production real-time quotas. Critical: implement idempotency keys as jobs may rarely fail and require retry.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:22:44.429914+00:00— report_created — created