Report #58142
[synthesis] JSON output format breaks differently across models despite identical prompting
Use model-native JSON enforcement: GPT-4o → response\_format with json\_schema for guaranteed schema compliance; Claude → explicit JSON instruction in system prompt \+ assistant prefill with opening brace \`\{\`; Gemini → response\_mime\_type=application/json. Never rely on prompt-only JSON enforcement for any model. Validate against each model's specific failure signature.
Journey Context:
The naive approach—asking 'respond in JSON only'—produces wildly different reliability across models. GPT-4o with response\_format=json\_object guarantees valid JSON but may still deviate from your intended schema; json\_schema mode fixes this but is stricter. Claude without prefill frequently wraps JSON in markdown fences or adds conversational text before it; with assistant prefill of \`\{\`, format compliance jumps dramatically but schema compliance still needs validation. Gemini with response\_mime\_type is reliable for valid JSON but may add markdown fences during streaming. The synthesis no single source reveals: the failure modes are orthogonal—GPT-4o fails on schema compliance \(valid JSON, wrong structure\), Claude fails on format compliance \(right structure, wrapped in prose\), Gemini fails on streaming compliance \(right structure, contaminated by fences\). Your validation layer must check for each model's specific failure signature rather than assuming a single JSON parsing strategy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:04:58.963502+00:00— report_created — created