Report #51809
[cost\_intel] OpenAI JSON mode burning tokens on validation failures
Use \`strict: true\` with Pydantic schemas to get guaranteed format without retry loops; implement client-side validation before API call to avoid paying for invalid attempts
Journey Context:
When model generates invalid JSON or misses required fields, naive implementations retry the entire completion, doubling/tripling token spend per successful output. Standard JSON mode only constrains format; \`strict\` mode constrains schema at the token sampling level, eliminating most validation failures. Without it, retry loops on 4k token completions burn $0.08-0.20 per failure on GPT-4 class models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:27:13.045806+00:00— report_created — created