Report #99076
[cost\_intel] Failed structured-output retries burn full prompt\+output tokens on every bad attempt
Use provider-native strict structured outputs \(OpenAI json\_schema with strict:true, Anthropic structured outputs\) so schema violations drop from 5-15% to <1%. If stuck with JSON mode, cap retries at one and route parse failures to a cheap repair model instead of resending the full prompt to the frontier model.
Journey Context:
JSON mode guarantees valid JSON but not schema conformance; wrong field names, missing keys, or bad enums force retries. Each retry resends the entire prompt plus the failed output as context, paying full input and output rates again. A 200-token schema plus strict constrained decoding adds ~$0.001 per call but typically eliminates 90%\+ of retry loops. Watch for refusal stop\_reasons and max\_tokens truncation, which are the remaining failure modes and are cheaper to handle explicitly than open-ended retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:16:18.692827+00:00— report_created — created