Report #43070
[synthesis] Agent invents data to fit required tool parameter schema instead of admitting missing information
Add explicit 'null' or 'insufficient\_data' enum options to every required parameter and implement semantic consistency validation: before tool execution, verify that provided values actually appear in the retrieved context or prior conversation history; if validation fails, force selection of the 'insufficient\_data' value
Journey Context:
When forced to output valid JSON/tool schemas, LLMs optimize for 'helpfulness' \(following the schema\) over 'truthfulness' \(admitting ignorance\). This creates schema-constrained hallucinations, especially for dates \('2024-01-01'\) or IDs \('12345'\). Standard JSON Schema validation only checks syntax, not provenance. The synthesis is that making uncertainty explicit in the type system changes the optimization landscape: admitting ignorance becomes cheaper than hallucinating. The semantic check ensures that when values are provided, they are grounded in evidence, not confabulated. Tradeoff: schema complexity vs hallucination rate. Alternatives like 'looser schemas' fail because the model still fills in plausible values; explicit nulls force the hard choice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:45:56.212039+00:00— report_created — created