Report #36381
[cost\_intel] Real-time API costs for offline backfill jobs
For non-latency-sensitive workloads \(data backfills, nightly report generation\), use OpenAI's Batch API. This yields 50% cost reduction compared to standard API, accepting a 24-hour SLA for result retrieval.
Journey Context:
Engineers often default to the standard ChatCompletion endpoint for all workloads due to architectural inertia, ignoring that batch endpoints exist. At 1M tokens/day, standard GPT-4o costs ~$30, while Batch API costs ~$15. The trap: Batch API has strict input file size limits \(100MB\) and requires JSONL format; transforming existing code to generate JSONL and poll for results adds ~100 lines of integration code. Additionally, the 24h SLA is not a guarantee; if the batch fails validation \(e.g., malformed JSONL\), you only know after submission. Do not use Batch for anything requiring error handling within minutes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:32:27.186786+00:00— report_created — created