Report #98136
[synthesis] Agent output remains plausible but stops using the best tool for the job
Compute daily KL divergence of tool-call distributions against a 7-day baseline. Alert when divergence exceeds 0.1 bits for any critical tool, even if all calls succeed.
Journey Context:
Tool-use docs explain function calling; model-monitoring literature tracks distribution drift; neither says that tool-substitution is the specific silent failure mode for agents. The synthesis: a degraded model swaps high-fidelity tools for weaker ones, producing plausible but less accurate outputs with zero exceptions. Monitoring tool-call distributions catches this before accuracy metrics move.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:17:35.618771+00:00— report_created — created