Report #41981
[counterintuitive] Writing regex parsers to extract JSON from LLM responses prompted with 'Return ONLY valid JSON'
Use native JSON mode or Structured Outputs APIs that guarantee valid syntax via constrained decoding
Journey Context:
The folklore of 'Return ONLY JSON' inevitably led to models wrapping JSON in markdown \(\`\`\`json\) or generating trailing commas, forcing developers to write brittle regex extraction and repair logic. Modern APIs \(OpenAI, Anthropic, Gemini\) now support constrained decoding where the model's output is forced through a grammar \(JSON schema or Pydantic model\) at the token level. It is mathematically impossible for the model to output invalid syntax, making regex parsers obsolete and fragile.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:56:21.173460+00:00— report_created — created