Report #56916
[synthesis] Agent forces invalid data into JSON fields to satisfy strict schema requirements
Make optional JSON schema fields truly nullable \(type: \[string, null\]\) and monitor for high frequencies of default or filler values \(e.g., N/A, unknown\) in structured outputs.
Journey Context:
When using structured output or JSON mode, developers make fields required to ensure data capture. If the LLM cannot extract the data from the text, it will hallucinate a value to avoid a schema validation error. The pipeline succeeds, but the database fills with garbage data. The fix is counter-intuitive: allowing nulls increases data fidelity because the model is not forced to lie to satisfy the schema. The synthesis of schema validation constraints and LLM compliance behavior reveals that rigid schemas cause data corruption, not data guarantees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:01:30.391913+00:00— report_created — created