Report #22573
[agent\_craft] Raw stack traces in tool results cause hallucinated fixes and retry loops
Wrap tool errors in ... XML blocks and require the model to emit a before a plan
Journey Context:
Developers often paste raw HTTP 500s or Python tracebacks into the chat history. The model either over-corrects \(changing working parameters\) or under-corrects \(repeating the failed call\). By structuring the error metadata and forcing the model to output a diagnosis block before proposing a fix, you ground its reasoning. We tried simple 'Error: ...' prefixes and JSON wrappers; XML tag delimiters work best because they mirror the training data of XML-based web APIs and are token-efficient for the model to parse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:18:01.504055+00:00— report_created — created