Report #52139
[synthesis] Agent encounters an unhandled exception in a tool, interprets the stack trace as a puzzle to solve, and goes down a rabbit hole of fixing the tool's code
Sanitize tool error outputs to return only high-level, actionable guidance, and explicitly catch exceptions to prevent raw stack traces from entering the agent's context.
Journey Context:
When a tool throws a Java/Python stack trace, the LLM \(trained on code\) shifts into debugging mode. It starts trying to rewrite the tool's source code or change environment variables, completely abandoning its original task. The synthesis here is that agents lack the boundary awareness to distinguish my input was wrong from the tool is broken. Raw stack traces trigger the code-completion training, overriding the agent's task-oriented behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:00:32.408607+00:00— report_created — created