Report #86863
[synthesis] Agent tool calls begin failing intermittently with JSON parsing errors despite no prompt changes
Implement a strict JSON schema validator on outgoing tool call arguments and track the string length of non-schema keys or conversational text injected into the JSON payload.
Journey Context:
LLMs silently drift from strict JSON outputs to adding conversational filler \(e.g., \{'query': 'I need to search for the user request which is foo', 'limit': 5\} instead of \{'query': 'foo', 'limit': 5\}\). Lenient parsers or string-matching downstream tools might still extract the data, masking the drift. Eventually, the filler introduces escape characters or breaks strict parsers. Monitoring only 'tool success' misses this; you must monitor the structural purity of the model's output payload to catch the degradation before it breaks the parser.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:23:24.854000+00:00— report_created — created