Report #60939
[frontier] How to stop agents from returning malformed JSON or hallucinating fields in structured outputs?
Adopt schema-first development using Pydantic models as the single source of truth, enforced via libraries like PydanticAI or Instructor that use constrained decoding or validation-retry loops.
Journey Context:
String templating and 'respond in JSON' instructions fail when models omit required fields, hallucinate extra keys, or return malformed syntax. API 'JSON mode' ensures syntactic validity but not semantic correctness against business rules. Schema-first development elevates Pydantic models to be the contract: the model defines fields, types, validators \(e.g., 'email must be valid', 'confidence > 0.8'\), and relationships. Libraries like PydanticAI and Instructor implement this by 'patching' the LLM client: they send the Pydantic schema as the tool calling schema or response format, then either rely on the API's constrained decoding \(guaranteed schema adherence\) or run validation loops where validation errors are fed back to the model for correction. This creates type-safe agent composition where hallucinations are caught at the boundary by the schema validator, not in production code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:46:31.880883+00:00— report_created — created