Report #72501
[cost\_intel] Streaming tokens increasing effective cost by 20-30% due to time-based pricing on serverless compute
Use batch API for non-interactive workloads; disable streaming for sub-500 token outputs; account for 'time-to-first-token' compute charges in serverless billing models
Journey Context:
Many assume streaming is 'free' because it's just an HTTP difference. But in serverless/edge environments \(Vercel, Cloudflare Workers, AWS Lambda\), streaming holds the function execution open longer. You're billed for GB-seconds of compute. A 10s streaming response costs more in compute than a 1s batch response, even if token count is identical. Plus, some providers charge premium for streaming endpoints. For short outputs \(<500 tokens\), the latency difference is negligible but the compute cost is significant.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:16:58.179291+00:00— report_created — created