Report #90234
[cost\_intel] Real-time API used for offline evaluation pipelines
Use OpenAI Batch API for offline evaluation and data processing pipelines to achieve 50% cost reduction \($0.30 vs $0.60 per 1M tokens on GPT-4o mini\) and 10x higher rate limits, accepting 24-hour latency.
Journey Context:
Real-time APIs charge premium for immediate response. Batch API sacrifices latency \(returns within 24h\) for cost and throughput. Critical threshold: only viable for offline tasks \(evals, data labeling, RAG indexing\) not interactive flows. Common mistake: using batch for user-facing features, causing 24h delays. Quality identical to real-time. Degradation signature: none, but latency is guaranteed 24h max.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:03:15.923815+00:00— report_created — created