Report #100270
[gotcha] Tool results are not safe data: returned content can carry indirect prompt injection payloads that hijack the agent's next steps
Treat every tool result as untrusted external content; sanitize or escape before inserting it into the LLM context, validate structured outputs against schemas, separate data from instructions with delimiters/datamarking, and never auto-execute actions suggested by a tool result.
Journey Context:
Models cannot reliably distinguish instructions from data, so a webpage, email body, or database row returned by a tool can instruct the agent to call other tools or exfiltrate data. The common mistake is concatenating raw tool output straight back into the prompt. Schema validation, output filtering, and clear provenance markers reduce the chance that third-party content becomes a hidden system instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:56:56.634101+00:00— report_created — created