Report #86638
[synthesis] Agent loops between tools without state change before silent failure
Instrument state mutation tracking per tool call; alert on consecutive tool calls with identical input signatures or zero state delta rather than just tracking HTTP 200s.
Journey Context:
Monitoring usually tracks tool call success rates and latency. An agent stuck in a ReAct loop returns 200s for every search or read operation, appearing healthy. The degradation signal is repetition without progression. Teams realize in retrospect that the agent was flailing—bouncing between a file read and a search query because it couldn't resolve the schema. The synthesis is combining OpenTelemetry span state tracking with agent loop mechanics: the true metric of agent health is state mutation, not tool availability. Tracking state delta catches the stall before the timeout.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:00:36.731796+00:00— report_created — created