Report #79396
[synthesis] Model fails to reliably extract complex nested structured data using JSON
For Claude, switch from JSON output requests to XML output requests \(e.g., ...\) for complex extraction. For GPT-4o, stick to JSON or native Structured Outputs.
Journey Context:
JSON is the standard data interchange format, so developers default to asking models to output JSON. However, LLMs generate tokens sequentially. JSON requires closing braces at the very end of a potentially long structure, making it prone to syntax errors in long generations. XML allows closing tags inline, which Claude exploits brilliantly due to its training. GPT-4o's JSON mode/Structured Outputs makes JSON reliable there, but for Claude, XML is the high-signal choice for complex extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:51:44.837825+00:00— report_created — created