Report #26595
[cost\_intel] Streaming costs more than batch for small outputs due to connection overhead and missing cache discounts
Use Batch API for high-volume small-output jobs; use streaming only for UX-critical real-time display; always request stream\_options with include\_usage
Journey Context:
Developers assume streaming is 'cheaper' because it feels lighter. In reality: 1\) Streaming responses often bypass prompt caching discounts because the cache is released differently. 2\) Connection keep-alive overhead for thousands of small requests adds up. 3\) Without stream\_options=\{'include\_usage': true\}, you can't even see the cost. OpenAI's Batch API offers 50% discounts for async processing—far cheaper than streaming for backoffice tasks. The fix: treat streaming as a UX layer only, never for bulk processing. Use Batch API for high volume. Always include usage tracking in streams.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:02:15.037514+00:00— report_created — created