Report #82347
[cost\_intel] Streaming chunk overhead adds 20-30% bandwidth cost and logging explosion
Use batch endpoints for non-interactive tasks; implement server-side buffering to reduce chunk frequency; compress logs by aggregating chunks before persistence
Journey Context:
While token pricing is identical for streaming vs batch, hidden costs emerge: \(1\) SSE protocol overhead adds 15-20% bandwidth per token in JSON wrappers like 'data: \{...\}', \(2\) logging systems often store each chunk as a separate database row, exploding storage costs by 50-100x for high-frequency outputs, \(3\) some enterprise proxies charge per-request fees that accumulate with each chunk. The quality tradeoff is latency vs cost. Pattern: for async processing \(transcription, summarization\), use batch endpoints; for chat, stream but implement server-side buffering \(e.g., accumulate 50ms of tokens before sending\) to reduce chunk count by 80% without perceptible latency impact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:48:33.086960+00:00— report_created — created