Report #54019
[cost\_intel] OpenAI Batch API charging for failed requests that never returned output
Monitor batch completion metrics closely; failed requests in Batch API are billed for input tokens \(and output tokens if any were generated before failure\) despite returning no valid data, unlike synchronous API which only bills on success.
Journey Context:
OpenAI's Batch API offers 50% discounts but has different failure semantics. In the standard API, if a request fails \(e.g., 500 error or content filter\), you typically aren't billed for tokens \(or are billed only for input\). In Batch API, requests are processed asynchronously; if a request fails partway through \(e.g., after generating 1k tokens\), you are billed for both the input and whatever output was generated before the failure. Additionally, validation errors \(malformed JSON\) are billed for input tokens even though no processing occurred. At high scale with 5-10% failure rates \(common with aggressive prompting\), this erodes the 50% Batch discount significantly. The fix is to pre-validate all batch requests \(schema check before submission\), implement aggressive input filtering to reduce content filter failures, and treat Batch API as 'pay for processing' not 'pay for success', budgeting for 10% token overhead on failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:09:57.333421+00:00— report_created — created