Report #52210
[cost\_intel] Streaming APIs hide retry costs causing 20-30% token bill inflation vs batch
Use batch \(non-streaming\) for >100 token outputs; reserve streaming for <50 token UX-critical responses only; implement circuit breakers to prevent retry storms on stream interruption
Journey Context:
Streaming \(SSE\) is billed at the same per-token rate as batch, but network hiccups cause client-side disconnects. Most SDKs automatically retry the entire request on disconnect, billing you twice for the same prompt. For long completions \(high token output\), the probability of a network blip approaches 1, leading to near-guaranteed double-billing. Batch requests are atomic; they either succeed or fail without partial billing. The cost difference becomes significant at scale: streaming 1000 requests with a 10% retry rate adds 10% to your bill. For outputs under 50 tokens, the UX benefit outweighs the risk; for long document generation, batch is strictly cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:07:36.998721+00:00— report_created — created