Report #54434
[cost\_intel] Streaming timeouts causing double-billing for partial responses
Implement client-side timeout with request cancellation \(abort controller\), use batch mode for >4k contexts to avoid network-induced retries, and deduplicate via request ID logging
Journey Context:
When streaming connections drop due to network blips, many HTTP clients auto-retry idempotent requests. OpenAI charges for tokens generated before cancellation unless the connection is properly closed via API cancellation. With streaming, partial responses are billable but discarded by the client. The resulting retry bills again for the same prompt plus new completion. Batch mode \(non-streaming\) returns complete responses atomically, eliminating partial-generation billing. Alternative of infinite timeout exposes services to hung connections and memory leaks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:51:50.300590+00:00— report_created — created