Report #39122
[cost\_intel] Streaming responses incur identical token costs but obscure token accounting until stream end
Disable streaming for backend batch jobs; use Batch API for 50% discount; implement token accumulator during stream to track costs in real-time.
Journey Context:
Streaming does not reduce token usage; input and output tokens are billed identically to non-streaming requests. However, streaming hides the total token count until the final chunk, making it hard to track costs mid-flight. For backend processing without UX requirements, streaming adds unnecessary client complexity. The Batch API offers 50% lower pricing for asynchronous workloads, which is strictly cheaper than streaming for non-interactive tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:08:26.405783+00:00— report_created — created