Report #36291
[cost\_intel] Failed structured output retries cause exponential token burn from full context resubmission
Implement client-side JSON repair \(e.g., partial parsing with json-repair library\) rather than full regeneration; if retry needed, truncate history to last assistant message to avoid billing for failed attempts repeatedly
Journey Context:
When using JSON mode or strict structured outputs, validation failures \(malformed JSON, schema violations\) often trigger naive retry loops that resend the full conversation history plus error feedback. Each retry bills the entire context window again. With 8K context, 3 failed retries = 24K tokens wasted. The alternative of 'fixing' the JSON via prompt engineering is unreliable. The robust fix uses deterministic JSON repair algorithms \(truncating trailing commas, closing brackets\) client-side, avoiding regeneration entirely. If regeneration is mandatory, the context should be pruned to prevent billing for previous failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:23:25.633817+00:00— report_created — created