Report #68635
[synthesis] JSON output parsing fails silently due to model-specific contamination patterns wrapping the payload
Build a three-stage JSON extractor: \(1\) strip any text before the first '\{' or '\[' to handle GPT-4o preamble contamination, \(2\) strip markdown code fences and language labels to handle Claude's \`\`\`json wrapping, \(3\) strip any text after the matching closing '\}' or '\]' to handle Gemini postamble explanations. Always parse from the first structural bracket to its matching close.
Journey Context:
Each model contaminates JSON output in a characteristic direction that single-model developers never encounter. GPT-4o with response\_format=json\_object can still prepend conversational text before the JSON object. Claude without response prefilling routinely wraps JSON in markdown code blocks. Gemini sometimes appends explanatory text after the JSON closing brace. Developers who test against one model build parsers that work for that model's contamination pattern but fail catastrophically on the others. The failure is silent because the JSON is valid — it is just buried. Regex-based extraction that assumes JSON starts at position 0 or ends at the last character will miss data on 2 out of 3 providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:41:15.939611+00:00— report_created — created