Report #63811
[cost\_intel] Exponential token cost from failed structured output retries on complex schemas
Implement cascade: attempt strict schema \(JSON mode\) once; if validation fails, switch to non-strict with manual parsing rather than retrying LLM; cap retries at 1
Journey Context:
OpenAI's JSON mode and structured outputs often fail on first attempt for nested schemas \(arrays of objects\), triggering retry loops. Each retry consumes full prompt tokens again \(2-3x cost multiplication\). Teams set max\_retries=3 in SDK assuming it's free, but each attempt charges full price. Alternative: single-shot with fallback to regex extraction or smaller model repair. Cost math: 3 retries \* $0.03 input = $0.09 vs $0.03 \+ $0.005 repair = $0.035. Quality signature: if schema has >3 nested levels or arrays, expect 40%\+ failure rate on strict mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:35:35.196241+00:00— report_created — created