Report #74412
[frontier] Agent fails on tool calls with malformed arguments and either repeats the same malformed call or gives up entirely, instead of learning from the error and correcting its input
Implement a self-correction loop: when a tool call fails validation, feed the error message, the expected schema, and the actual arguments back to the agent and ask it to fix the call. Set a maximum retry count of 3. Use structured error feedback that explicitly shows the mismatch between expected and actual arguments.
Journey Context:
Default agent behavior on tool call errors is either: \(a\) surface the error to the user \(bad UX, the user didn't write the malformed call\), or \(b\) retry the exact same call \(definition of insanity\). Both fail because the LLM doesn't understand what went wrong from a generic error. The emerging pattern is structured self-correction: when a tool call fails validation, the agent receives a structured error object containing the error type, the expected schema, the actual arguments provided, and a specific message about the mismatch. The LLM then gets to fix its call. This works because LLMs are surprisingly good at fixing their own mistakes when given clear, specific feedback—the problem is almost always a schema mismatch or type error, not a fundamental reasoning failure. The instructor library builds this pattern in as a default with its retry mechanism. Tradeoffs: each retry costs an LLM call, and in rare cases the agent enters a correction loop \(prevented by the max retry count\). The key insight: most tool call failures are fixable if you give the agent the right diagnostic information. Generic error messages \('tool call failed'\) don't help; specific feedback \('expected integer for parameter limit, got string "10"'\) does. This pattern is moving from library-level \(instructor\) to framework-level, and will soon be expected default behavior in any production agent framework.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:29:49.354023+00:00— report_created — created