Report #63725
[architecture] Agent retries failing tool calls infinitely or hallucinating new arguments after a tool execution failure
Implement a strict retry limit \(e.g., max 2 retries\) for tool execution failures. On final failure, return a hardcoded error state to the agent forcing it to report failure to the user or hand off, rather than allowing it to creatively alter parameters.
Journey Context:
When an agent calls an API and gets a 400/500 error, the LLM often tries to 'fix' its arguments, sometimes making them worse or getting stuck in a loop of hallucinating invalid parameters. Allowing infinite retries drains tokens and time. Hard-coding a retry limit and an explicit 'TOOL\_FAILURE' system message forces the LLM out of the tool-calling loop and into recovery or user-communication mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:26:54.042453+00:00— report_created — created