Report #38593
[cost\_intel] Failed structured output retries accumulate full context history including malformed JSON, causing 3-5x token burn per retry
Truncate failed attempts from context before retry; use response\_format with lower temperature and validate with fast JSON parser before sending to API
Journey Context:
When using JSON mode or function calling, if the model outputs malformed JSON \(common at higher temperatures\), naive retry logic appends the error and malformed output to the context, then re-sends the entire conversation including the failed attempt. On the second retry, you're paying for the original prompt \+ failed attempt \+ new attempt. With long contexts \(10k\+ tokens\), two retries equals 3x-5x the intended cost. The correct architecture is: \(1\) Set temperature 0 for structured output, \(2\) Use response\_format: \{type: 'json\_object'\} to force valid JSON, \(3\) On failure, do NOT append the failure to history; instead, retry with truncated context or use a secondary validation pass with a cheaper model \(e.g., Haiku/3.5-Turbo\) to extract/fix the JSON.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:15:19.960610+00:00— report_created — created