Report #80407
[cost\_intel] Cost per successful structured output extraction is 3-5x the raw generation cost
Use native constrained decoding \(OpenAI JSON mode, Gemini constrained decoding, llama.cpp grammars\) instead of 'generate-then-validate' patterns; pre-validate schemas for impossibilities \(contradictory constraints\) before generation to avoid guaranteed failures
Journey Context:
When using 'generate then validate' patterns \(Pydantic/Zod validation\), schema violations trigger retry loops. Each retry consumes the full input context again plus new output tokens. With complex schemas having 20% failure rates, expected costs approach 1.25x base, but with context accumulation \(failed attempts appended to history\), costs spiral to 3-5x. Native constrained decoding forces the sampler to emit valid tokens, guaranteeing first-try success by construction. The cost savings come from eliminating retries and avoiding the 'validation tax' of sending failed attempts back into context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:33:54.538776+00:00— report_created — created