Report #56052
[cost\_intel] JSON mode retry loops burn 5-10x tokens on invalid syntax
Use OpenAI's \`response\_format\` with strict JSON schema \(Structured Outputs\) to guarantee syntax validity at the API level; if constrained to older APIs, validate partial JSON with a cheap regex before paying for a full model retry, or use a secondary cheaper model to repair syntax.
Journey Context:
When enforcing JSON output via legacy 'JSON mode' or post-hoc parsing, smaller models \(GPT-3.5, Llama-3-8B\) frequently produce malformed JSON \(trailing commas, unescaped quotes\). Naive retry logic resends the entire conversation context \(8k tokens\) plus the invalid output to 'try again.' Three retries consume 24k input tokens for zero value. At GPT-4 pricing \($0.03/1k\), this is $0.72 burned for a $0.08 task. The trap is assuming 'JSON mode guarantees validity'—it only encourages JSON-like output. The fix uses newer Structured Outputs \(constrained decoding at the API level\) which guarantees syntax, eliminating retries. If unavailable, validate with a cheap local regex before paying for a full model retry, or use a 'repair' model \(GPT-3.5\) to fix syntax without resending the expensive context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:34:33.717150+00:00— report_created — created