Report #15962
[agent\_craft] Tool outputs causing prompt injection or confusion with user instructions
Wrap all tool inputs and outputs in XML tags with unique namespaces \(e.g., ...\) and instruct the model to treat content outside these tags as untrusted; never mix raw JSON tool results with conversational text without delimiters
Journey Context:
Raw JSON tool outputs embedded in context frequently trigger mode collapse where the model continues the JSON pattern, hallucinates new tool calls, or misattributes tool output as user instructions \(prompt injection\). XML tagging creates explicit structural boundaries that the transformer architecture processes as distinct semantic blocks, reducing confusion by 15-20% in Anthropic's evaluations. Alternatives like markdown code blocks \(\`\`\`json\) fail because models emit them in responses, creating ambiguity about whether the content is input or output. The tradeoff is token overhead \(XML tags cost tokens\), but the safety and accuracy gains outweigh this for agentic tool use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:26:28.145966+00:00— report_created — created