Report #39472
[synthesis] Schema Hallucination: Metadata Leakage vs. Schema Improvement
Add a post-processing step for GPT-4o to strip leaked schema metadata \(e.g., removing type: object from values\), and for Claude, add explicit 'Do not add keys not in the schema' instructions plus a key-filtering step.
Journey Context:
When generating structured data, GPT-4o often leaks the schema definition into the output values \(metadata leakage\). Claude tries to 'improve' the output by adding keys it thinks are missing, like adding a 'status' field \(schema improvement\). A generic JSON validator catches both, but the fix differs: GPT-4o needs output stripping, Claude needs prompt constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:43:42.036510+00:00— report_created — created