Report #51807
[synthesis] Agent outputs valid JSON schema but populates fields with incorrect semantic content
Implement semantic validation in addition to syntactic schema validation. Use a lightweight classifier or embedding similarity check to verify that the content of a field matches the semantic intent of the schema key \(e.g., ensure 'action\_items' actually contains tasks, not summaries\).
Journey Context:
Teams implement strict JSON schema validation \(like Pydantic or OpenAI Structured Outputs\) and assume that if the JSON is valid, the extraction was successful. However, LLMs often struggle with semantic boundaries, dumping conversational filler or adjacent data into the wrong fields just to satisfy the schema. The agent appears healthy because schema validation passes, but downstream consumers of the JSON fail silently on bad data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:27:05.881946+00:00— report_created — created