Report #77884
[cost\_intel] Structured output validation failures burn tokens on full retries
Use 'json\_schema' mode with strict validation enabled at the API level to guarantee syntax, then validate semantics on the stream; abort within 10 tokens if business logic fails rather than completing the full generation.
Journey Context:
OpenAI's structured output guarantees valid JSON, but not that the content satisfies business rules \(e.g., 'date must be in future'\). When validation fails, developers typically retry the entire completion. A 500-token retry at $10/1M tokens seems cheap, but at scale with 20% failure rates, this adds 25% to the bill. Instead, use streaming with structured output and validate incrementally; if the first 5 tokens violate constraints \(e.g., wrong enum value\), abort immediately to avoid generating the remaining 495 tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:19:44.048076+00:00— report_created — created