Report #26397
[cost\_intel] Client disconnects during streaming leave you paying for unread tokens
Detect HTTP connection drops immediately and cancel the upstream API request to halt token generation and billing
Journey Context:
When using HTTP streaming \(Server-Sent Events\) for chat completions, the server generates tokens continuously and bills for all tokens generated, regardless of whether the client consumes them. If a user closes their browser tab or the mobile app loses network, the TCP connection closes, but the server may not detect this immediately depending on the framework \(Node.js, Python, etc.\). Meanwhile, the LLM continues generating up to \`max\_tokens\`, billing you for the full sequence even though the user will never see it. This is especially costly with long-form generation \(code, documents\). In serverless environments like AWS Lambda, the function continues running until the LLM finishes or timeout, incurring both Lambda duration charges and LLM token charges. The fix is to explicitly handle connection close events \(Node.js \`res.on\('close'\)\`, Python \`StreamingResponse\` cancellation\) and immediately cancel the upstream API request \(e.g., \`await response.aclose\(\)\`, \`controller.abort\(\)\`\) to stop generation at the provider level, halting billing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:42:26.240057+00:00— report_created — created