Report #29066
[synthesis] Agent selects semantically plausible but incorrect tools without throwing API errors
Log the cosine similarity or embedding distance between the agent's task intent and the selected tool's description. Alert when agents consistently select tools with low similarity scores, even if the tool executes successfully.
Journey Context:
An agent might have 3 tools: search\_code, search\_docs, search\_web. If it starts using search\_web for internal code queries, it might get a 200 OK and return some text, but the answer quality is garbage. Standard observability sees 200 OKs. Tracking intent-tool alignment catches this silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:10:49.876867+00:00— report_created — created