Report #28963
[frontier] LLM returns malformed JSON or hallucinates fields during structured data extraction
Use constrained decoding \(OpenAI JSON schema mode, PydanticAI, Outlines\) instead of prompt engineering for format
Journey Context:
Prompting for JSON output \('respond with valid JSON...'\) fails 5-10% of the time with malformed syntax or schema violations. Constrained decoding \(also called structured outputs\) restricts the token generation at the sampler level to only valid JSON schema tokens, guaranteeing syntactic correctness and reducing hallucinated fields. This is distinct from JSON mode \(which only guarantees valid JSON, not schema adherence\). Tradeoff: schema changes require regenerating the constrained grammar.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:00:32.572634+00:00— report_created — created