Report #99081
[cost\_intel] SDK retries and client timeouts bill for work the application never receives
Set max\_retries explicitly, add circuit breakers, and log provider usage metadata even when the client raises a timeout. Use request\_id to deduplicate retried calls. Avoid very short client timeouts on long-context or reasoning requests where the server may finish and bill after the client has given up.
Journey Context:
The OpenAI Python SDK retries failed requests by default; a 5% transient error rate can add 10-15% to token spend. If your timeout is shorter than model latency, the client abandons the call but the server often completes it and bills in full. These charges appear in provider dashboards but not in application logs. The fix is observability at the proxy or SDK layer, plus retry policies that cap total attempts per time window rather than per request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:16:33.018380+00:00— report_created — created