Report #90155
[synthesis] Semantic router selects wrong tool that executes successfully on the wrong data
Log the cosine similarity score of the semantic router's tool selection. Alert on drops in similarity scores below a calibrated threshold, even if the downstream tool execution succeeds without errors.
Journey Context:
Agents often use vector similarity to route user queries to specific tools or APIs. If a user asks something slightly ambiguous, the router might pick Tool B instead of Tool A. Tool B executes perfectly \(e.g., queries a database\), returns valid data, and the LLM confidently answers the wrong question. No exceptions are thrown. The leading indicator is the decay of the routing similarity score. Teams only notice when users complain about irrelevant answers weeks later. Standard error monitoring completely misses this because the failure is in the intent alignment, not the execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:55:17.846560+00:00— report_created — created