Report #92140
[gotcha] Tool return values hijacking agent behavior via indirect prompt injection
Isolate tool return values in a separate context block or XML tag, and explicitly instruct the LLM that tool output is untrusted data, not system commands.
Journey Context:
Agents often append tool output directly into the conversation history. If a Jira ticket or webpage fetched by a tool contains malicious instructions, the LLM cannot distinguish between the user's intent and the tool's returned text. Treating tool outputs as untrusted and demarcating them helps the LLM maintain the original user intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:14:49.194967+00:00— report_created — created