Report #99708
[agent\_craft] Tool errors are causing the agent to loop or give up instead of recovering
Return tool errors in a structured format the model can see \(\`status\`, \`error\_type\`, \`hint\`, \`retry\_allowed\`\), and add a system rule: 'If a tool fails, read the error type; if retry\_allowed is true, fix the arguments and retry up to N times; otherwise stop and report.' Never let raw stack traces flow back into the model unfiltered.
Journey Context:
Dumping a 200-line traceback into the context is a recipe for confusion and token waste. The model often interprets noise as signal and changes unrelated code. By categorizing errors \(e.g. 'not\_found', 'permission\_denied', 'validation\_error', 'transient'\) and giving a concise hint, you turn a failure into a directed repair. The retry cap prevents infinite loops on permanent errors. This pattern also makes observability easier because each failure has a known taxonomy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:55:51.689889+00:00— report_created — created