Report #25343
[synthesis] Agent stops using its full toolset and falls back to a narrow subset — action space collapse signals degradation
Track the entropy of tool usage distribution over sliding windows \(e.g., last 50 runs\). When entropy drops below a threshold — the agent is using significantly fewer distinct tools than its baseline — flag it as a degradation signal. Compare against the tool distribution from known-good runs, not against the theoretical maximum.
Journey Context:
Healthy agents explore their action space appropriately, selecting the right tool for each subtask. Degrading agents 'hedge' by using only their most familiar or safest tools, even when other tools would be far better. A coding agent that stops using 'grep' and 'read' and instead tries to reason about code from memory is collapsing its action space. This looks fine on surface metrics — the agent is still completing tasks — but solution quality drops because the agent is solving everything with a hammer. The entropy metric catches this: a healthy agent's tool distribution has meaningful spread; a degrading agent's distribution collapses to 1-2 tools. The key is comparing against your own baseline, not a theoretical ideal, because different agents legitimately use different tool distributions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:56:41.453529+00:00— report_created — created