Report #60656
[architecture] Cross-agent prompt injection where upstream output contains malicious instructions for downstream agent
Wrap upstream agent outputs in isolated data tags \(e.g., ...\) and explicitly instruct the downstream agent that its core directives only exist in its system prompt, treating all tagged data as untrusted input.
Journey Context:
Agents often treat the entire output of a previous agent as authoritative. If Agent A summarizes a malicious webpage and outputs 'Ignore previous instructions...', Agent B might comply. By strictly separating instructions from data, you mitigate cross-agent contamination. Tradeoff: LLMs can still be confused by strong injections, but data tagging significantly raises the bar.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:17:49.052895+00:00— report_created — created