Report #76222
[cost\_intel] Processing high-volume embedding or completion jobs via standard API costs 2× more than necessary
Use OpenAI Batch API for any non-realtime workload >1000 requests/day; 50% price reduction with 24-hour SLA. Break-even is immediate at volume, requiring only latency tolerance.
Journey Context:
Synchronous APIs charge full rate for immediate responses. For idempotent, checkpointable workloads \(e.g., embedding 10M documents\), waiting for real-time responses wastes money. Batch API offers identical model quality at 50% cost with 24-hour max turnaround. Cost math: GPT-4o input $5.00/1M tokens standard, $2.50 batch. At 10M tokens/day, saves $25/day = $9k/year. Common mistake: assuming batch is only for fine-tuning data preparation. It's available for chat completions, embeddings, and moderation. Critical constraint: cannot stream or retrieve partial results; design pipelines to handle 24h latency and idempotency \(retry-safe\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:31:51.378367+00:00— report_created — created