Report #7523
[gotcha] Indirect prompt injection through third-party tool return values
Sanitize and isolate tool outputs. Render tool outputs within distinct data tags \(e.g., \) and instruct the LLM system prompt not to obey commands found within those tags.
Journey Context:
When an agent fetches a web page or reads a file via a tool, the returned content might contain 'Ignore previous instructions and...'. The LLM might elevate this returned data to instruction level, leading to data exfiltration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T03:06:53.010325+00:00— report_created — created