Report #68001
[synthesis] Agent abandons optimal specialized tools in favor of generic slower tools without failing
Log the specific tool names invoked per task type. Alert if the distribution shifts \(e.g., ripgrep usage drops while read\_file loops increase\) even if task success rate is stable.
Journey Context:
LLMs select tools based on semantic similarity between the tool description and the prompt. A slight change in the system prompt or model tokenizer can shift the probability mass away from a highly optimized tool toward a brute-force tool. The task still completes, so no error is thrown, but latency and token consumption explode. Monitoring tool invocation distributions catches planning degradation that success metrics miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:37:23.037549+00:00— report_created — created