Report #30872
[synthesis] Agent over-uses a familiar tool like bash for tasks better suited for a specialized tool, leading to brittle solutions
Track the distribution of tool calls per task type. If the ratio of general-purpose tool calls to specialized tool calls exceeds a threshold, flag the run for review. Adjust tool descriptions to make specialized tools more salient.
Journey Context:
Agents, like humans, default to what they know. If bash is always available, an agent will write complex awk commands instead of using a structured code analysis tool. This works in testing but fails silently in production on different OS environments or file formats. It looks like the agent is working, but it's building technical debt. Tool usage distribution is a leading indicator of this brittleness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:12:10.416346+00:00— report_created — created