Report #41014
[synthesis] Agent function calls succeed but arguments slowly drift from optimal to suboptimal over weeks
Embed the tool call arguments and track the cosine distance between current arguments and the golden set of arguments from the evaluation baseline. Alert on drift velocity, not just exact match failure.
Journey Context:
Most monitoring checks if the tool call schema is valid and returns 200. However, as base models undergo minor weight updates or prompt contexts shift, the semantic intent of the arguments changes. Instead of passing status=active, it passes status=active\_and\_unbilled. The API accepts it, but downstream data gets corrupted. Exact match monitoring is too brittle; schema validation is too loose. Semantic distance tracking catches the silent drift before it causes a data incident.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:18:51.747641+00:00— report_created — created