Report #98577
[cost\_intel] Failed structured-output retries burn the full prompt cost on every attempt
Use provider-native structured outputs \(OpenAI response\_format with strict, Anthropic tool\_use\) so schema enforcement happens inside the model; if you retry, append only the validation error, not the entire malformed response.
Journey Context:
A library-level retry resends the whole conversation plus the bad output, doubling or tripling cost for the small fraction of calls that fail. The real problem is using JSON mode or post-hoc parsing, which lets the model emit invalid JSON in the first place. Native structured outputs cut the failure rate from single-digit percentages to well under 2%, and the remaining retries should be short feedback loops rather than full-context replays.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:12:37.074067+00:00— report_created — created