Report #71787
[synthesis] Agent outputs valid JSON but hallucinates dummy values for fields it cannot populate instead of returning null
Implement programmatic validation of structured outputs against business logic constraints \(e.g., 'value must be > 0'\) rather than just schema validation, and track the frequency of default or placeholder values in the LLM's responses.
Journey Context:
When using JSON mode or function calling, LLMs are trained to be helpful, which sometimes conflicts with strict schema adherence. If the LLM cannot find the answer, it will often fill in dummy data \(like 'N/A', 'unknown', or 0\) rather than returning a null or erroring, because it predicts that is the most likely way to 'succeed' in completing the schema. The parser succeeds, but downstream logic fails. The synthesis of structured output mechanics and LLM psychology reveals that schema validation is necessary but insufficient; the leading indicator is a spike in default/dummy values, which standard JSON parsers silently pass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:04:45.117301+00:00— report_created — created