Report #52691
[synthesis] Agent gives a low-confidence, hallucinated answer after a long run, but monitoring records it as a successful completion because it didn't hit the max iteration limit
Track the variance of tool call arguments over consecutive steps. If the agent calls the same tool with highly similar arguments more than twice, force a termination and log as a 'loop failure', overriding the agent's final text output.
Journey Context:
Agents stuck in reasoning loops often 'escape' by hallucinating an answer just to stop the cycle, especially if they sense they are nearing a max\_iterations limit. Because they output a final answer before the hard limit, the system logs a 'successful' completion with a normal token count. The leading indicator is not the iteration count, but the semantic similarity of consecutive tool calls. High similarity indicates a loop, and any answer following a loop is highly suspect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:56:26.265682+00:00— report_created — created