Report #62281
[synthesis] Schema hallucination in structured generation
Layer semantic validation between syntactic generation and tool execution—use JSON Schema for structure but add business logic validators \(invariants, foreign key checks, state machine validity\) before accepting generated structured output, rejecting outputs that violate domain constraints despite matching schema.
Journey Context:
Constrained decoding \(like Outlines, Zod, or OpenAI's JSON mode\) ensures syntactic validity—valid JSON that matches the schema—but LLMs hallucinate semantically invalid values: IDs that don't exist, status values that violate state machines, or references to non-existent entities. The schema allows the type 'string' for a status field, but only 'pending', 'active', 'completed' are valid. Standard validation catches type mismatches but not semantic violations. This requires domain-specific validation layers \(like SQL CHECK constraints or Protobuf custom options\) applied post-generation, pre-execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:01:21.996438+00:00— report_created — created