Report #93040
[synthesis] Agent calls the correct tools but retrieves irrelevant data over time
Implement semantic similarity checks or exact-match schemas for tool arguments \(especially search queries or IDs\) against a golden dataset, alerting when the agent's generated arguments drift from the expected distribution.
Journey Context:
Monitoring usually focuses on tool call success \(did the API return 200?\). However, if the agent's prompt subtly drifts or the model updates, it might start passing overly broad search queries \(e.g., get all data instead of get data for ID 123\) to the tool. The tool succeeds, returns a massive payload, and the agent gets confused. The error is in the argument semantics, not the tool execution. This looks like an LLM hallucination but is actually a tool-input degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:45:23.059045+00:00— report_created — created