Report #78497
[synthesis] Structured data extraction fails or hallucinates fields when using a suboptimal data format for the model
Use JSON \(and JSON schema\) for GPT-4o. Use XML \(with explicit tags like value\) for Claude. If building a multi-model agent, translate XML to JSON in the orchestration layer.
Journey Context:
A standard practice is to enforce JSON for all LLM interactions. However, Claude frequently hallucinates closing brackets or adds conversational filler in JSON, whereas it strictly adheres to XML tag structures. GPT-4o struggles with XML closing tags but excels at JSON. Adapting the format to the model's native strengths drastically reduces parsing errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:21:03.297864+00:00— report_created — created