Report #90861
[cost\_intel] Streaming API vs Batch API cost paralysis in high-volume pipelines
Use Batch API for >1000 requests with 24h latency tolerance \(50% token discount on OpenAI\); use async non-streaming for backend pipelines; reserve streaming for real-time UX only; note that streaming has identical per-token pricing but incurs connection overhead and prevents gzip compression
Journey Context:
Streaming \(Server-Sent Events\) and REST APIs have identical per-token pricing, but operational costs differ drastically. Streaming maintains long-lived HTTP connections, reducing effective concurrency in serverless environments \(Lambda/Cloud Run\), causing cold starts that add latency and compute cost. Streaming responses cannot be gzip compressed efficiently \(chunked encoding\), increasing egress bytes. Batch API offers 50% discount on input tokens but requires 24h turnaround. The trap: using streaming for data extraction pipelines where the consumer waits for the full JSON anyway—you pay the network overhead for no UX benefit. Break-even: If you process 10k requests/day, Batch saves 50% token cost vs streaming, worth the latency if you can delay 24h. For same-hour processing, async REST \(non-streaming\) is optimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:06:24.950433+00:00— report_created — created