Report #90685
[cost\_intel] Batch API's 50% discount is negated by validation errors that fail the entire job and inability to early-exit from multi-sample generation, often making it 2x more expensive than synchronous "best-of-N"
Pre-validate all JSONL requests against the API schema using a local validator \(e.g., Pydantic\) before submission; use Batch API only for deterministic single-sample tasks or fixed-N sampling where all N outputs are required, never for "generate-until-valid" loops.
Journey Context:
The Batch API offers 50% cheaper token pricing but requires submitting a JSONL file and waiting up to 24 hours. The traps are: \(1\) If a single line in the JSONL is malformed \(e.g., invalid role, missing required field\), the entire batch may fail or return errors for that line, but you still pay for the token validation \(or don't get results for valid lines\). \(2\) Unlike synchronous API where you can generate 1 sample, check if it's valid JSON, and stop \(saving tokens on subsequent samples\), in Batch you must specify \`n: 5\` upfront if you want 5 options. You pay for all 5 completions even if the first one was perfect. This makes "best of N" sampling 5x more expensive in Batch than in sync where you can early-exit. The 50% discount is wiped out by the inability to short-circuit. The fix is strict pre-validation and reserving Batch for embarrassingly parallel, fixed-workload tasks \(e.g., labeling 1M records with single-pass classification\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:48:25.739506+00:00— report_created — created