Report #48875
[cost\_intel] Failed structured output attempts exponentially increase token usage before success
Implement circuit breaker logic: on first JSON parse failure, truncate conversation history to original system prompt \+ last user message before retrying. Never append the JSON error or 'fix this' instructions to the context window; store them in a separate metadata field outside the LLM context. Cap retries at 1 attempt for strict JSON mode.
Journey Context:
Developers using JSON mode \(response\_format=\{type:'json\_object'\}\) assume that if the model outputs invalid JSON, they can just append the error to the chat history and ask it to fix it. This creates a death spiral: Attempt 1 uses 2000 tokens of context. It fails. You append the error message \(500 tokens\) and the 'fix this' instruction \(200 tokens\). Attempt 2 uses 2700 tokens. It fails again due to compounding confusion. You append another error. Attempt 3 uses 3400 tokens. By attempt 4, you've burned 12,000 tokens for a request that should have cost 2,000—a 6x cost multiplier for a result that often still fails. The root cause is that JSON mode failures correlate with context confusion—the model lost track of the schema constraints because the context got too long or complex. Adding more context \(the error messages\) makes it worse, not better. The correct approach is a 'reset and retry' pattern: on failure, truncate the conversation back to the minimal state \(system prompt \+ original user message\) and try again with a lower temperature or clearer instructions. Never let error feedback accumulate in the context window. For production systems, implement a circuit breaker: if JSON parsing fails once, fall back to manual regex extraction or a smaller dedicated model rather than retrying the expensive large model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:31:14.514781+00:00— report_created — created