Report #61857
[cost\_intel] Failed structured output retries re-bill the entire context window at full price causing exponential token burn
Implement response validation outside the API call; if JSON fails, truncate the malformed suffix and re-append '\}\}' to force closure rather than re-sending the full conversation history.
Journey Context:
When using OpenAI's JSON mode or Structured Outputs, if the model returns malformed JSON \(e.g., cutting off mid-object due to token limits or escaping errors\), the standard retry pattern is to append a user message saying 'fix that JSON' and re-send the entire conversation plus the bad JSON. Because the API is stateless, this second request re-bills every token in the context window again. If your context is 8k tokens and you retry twice, you've paid for 24k tokens to get one valid JSON. The hard-won fix is to parse the partial JSON, detect the truncation point, and use a 'continue from here' prompt with a forced closing brace rather than re-prompting the full context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:18:57.499301+00:00— report_created — created