Report #76444
[cost\_intel] Non-native JSON mode causes 3x retry cost spiral
Use models with native structured output support \(GPT-4o, Claude 3.5 with tool use\) rather than prompting for JSON then parsing. Legacy models without native JSON mode \(older GPT-3.5, some open-source models\) require 2-3 retries on average for valid JSON, effectively tripling cost and latency.
Journey Context:
Teams use older GPT-3.5-turbo with 'respond in JSON' prompts, then parse with json.loads\(\). This fails frequently due to hallucinated markdown fences, trailing commas, or unescaped quotes. Each retry consumes full token cost. Native structured output modes \(OpenAI's json\_schema, Anthropic's tool use\) constrain the decoder to valid JSON, achieving 99%\+ first-pass success. The cost difference is 3x for non-native due to retry loops, plus engineering time handling parsing edge cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:53:57.702875+00:00— report_created — created