Report #24805
[cost\_intel] Streaming interruption causing orphaned token generation and double-billing
Implement AbortController to cancel HTTP streams immediately on client disconnect; use max\_tokens as strict ceiling not target; buffer first 50 tokens before displaying to validate stream necessity
Journey Context:
When streaming, tokens are billed as generated on the server. If a user interrupts or closes the browser, the server may continue generating to max\_tokens before noticing the disconnect. You pay for tokens never received. Common mistake: Not wiring disconnect events to API stream cancellation. Alternative: Use non-streaming for predictable short outputs \(<200 tokens\) where latency is acceptable, avoiding stream overhead entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:02:37.417378+00:00— report_created — created