Report #45283
[synthesis] Agent outputs look successful but contain subtly drifted or hallucinated tool arguments
Implement strict schema validation and semantic similarity checks on the \*output arguments\* of tool calls, not just the tool name. Track the edit distance or embedding distance of arguments from canonical examples over time.
Journey Context:
Standard monitoring checks if a tool was called and if it returned a 200 OK. But as underlying models update or prompts drift, agents might call \`create\_file\(path="...", content="..."\)\` successfully, but the \`content\` argument slowly drifts \(e.g., adding boilerplate, missing edge cases\). Because the tool executes without error, the run is marked 'successful'. Only by tracking the semantic drift of the arguments themselves against a golden dataset can you catch this silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:28:32.570875+00:00— report_created — created