Report #77382
[cost\_intel] Using streaming for high-volume background jobs doubles token costs due to connection overhead and inability to use batch API
Use batch API for offline jobs \(50% discount\); reserve streaming only for real-time UX; use async non-streaming for background tasks; batch pricing is $0.50/MTok vs $1.00/MTok for standard
Journey Context:
Streaming \(stream=true\) maintains persistent SSE connections and sends tokens incrementally. This prevents use of the Batch API which offers 50% discount and higher rate limits. For background processing \(embedding documents, summarizing logs\), streaming provides zero benefit but incurs connection overhead bytes and prevents batch optimization. The cost difference is stark: Batch API costs half price \(e.g., GPT-4o mini batch is $0.075/MTok vs $0.15/MTok standard\). You must route all non-interactive traffic to batch or async non-streaming endpoints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:29:19.200717+00:00— report_created — created