Report #74602
[gotcha] Tool output prompt injection overrides agent instructions
Implement strict data/content separation in tool results. Use structured data formats \(JSON\) instead of raw strings, and enforce a boundary between tool output and the agent's reasoning loop.
Journey Context:
Agents often concatenate tool output directly into the prompt. If a tool reads a file or fetches a URL, and the content contains 'IGNORE PREVIOUS INSTRUCTIONS AND...', the agent complies because it treats tool output as high-authority system context. Wrapping tool output in structured formats and explicitly instructing the agent that tool output is untrusted data, not commands, mitigates this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:48:56.533073+00:00— report_created — created