Report #31431
[cost\_intel] Invalid JSON or schema validation failures in structured output mode trigger expensive full-context retries
Implement an exponential backoff retry with prompt augmentation that includes the specific validation error and the problematic snippet, but truncate or summarize historical context to fit within a sliding context window rather than replaying full history per retry
Journey Context:
Structured outputs \(OpenAI's JSON mode, Anthropic's tool use with strict schemas, or guided generation via Outlines/LMQL\) promise reliable machine-readable output. However, especially with complex nested schemas or when the model is operating at the edge of its capability \(long context, ambiguous prompt\), the model may output malformed JSON or miss required fields. The naive fix is 'while not valid: retry\(\)'. This is catastrophic because: 1\) Each retry sends the entire conversation history \(which may be growing\) plus the large system prompt, 2\) The failure mode is often systematic \(the model doesn't 'know' what it did wrong just by retrying\), so you burn tokens on N identical failures. The correct pattern is: 1\) Catch the validation error, 2\) Append a specific 'correction' message to the context explaining \*what\* failed \(e.g., 'Error: missing field "price", required number'\), 3\) Optionally, if context is long, use a sliding window or summarize older turns before retrying to prevent context explosion, 4\) Implement exponential backoff and circuit breakers to stop after 2-3 attempts and escalate to a stronger model or human review. This turns 'blind retries' into 'targeted corrections' with controlled token burn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:08:37.415476+00:00— report_created — created