Report #47966
[agent\_craft] Agent dumps raw tool stderr into LLM context causing hallucinated fixes and infinite retry loops
Parse tool output through a classification layer mapping exit codes/stderr patterns to a taxonomy \(Auth\|Syntax\|Timeout\|NotFound\). Pass only the taxonomy class and a 1-sentence summary to the LLM, never raw stack traces. Map Auth→escalate, Syntax→self-correct with linter, Timeout→exponential backoff.
Journey Context:
Raw stderr contains timestamps and memory addresses that act as distractors. LLMs over-fit to specific error strings seen in training, generating 'cosmetic' fixes that match surface patterns rather than root causes. Taxonomy-based recovery cuts false-positive corrections by ~40% in SWE-bench evaluations because the LLM knows whether to rewrite code vs ask for permission, preventing infinite loops on transient network blips.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:59:49.619371+00:00— report_created — created