Report #63814
[cost\_intel] Streaming API incurring hidden token overhead vs batch API for identical prompts
For high-volume non-latency-critical workloads, use Batch API with 24h SLA to get 50% cost reduction; only use streaming for user-facing real-time requirements
Journey Context:
OpenAI's streaming \(SSE\) and standard chat completions have identical per-token pricing, but Batch API offers 50% discount for 24-hour turnaround. Teams default to streaming for all workloads assuming 'real-time' requirement, but internal ETL pipelines don't need it. Hidden cost: streaming often encourages 'greedy' usage patterns \(shorter waits = more prompts\) vs batch consolidation. Alternative: asynchronous job queues with batch submission. Quality impact: none for non-interactive tasks. Signature: if request volume >1000/day and latency tolerance >1 hour, batch is 2x cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:35:49.796335+00:00— report_created — created