Report #49783
[architecture] Indirect prompt injection where Agent A output overrides Agent B system prompt
Treat the output of any upstream agent as untrusted. Isolate Agent B's system prompt from the context payload using strict role tagging \(system vs user/tool\) and implement input sanitization guardrails before the handoff.
Journey Context:
A common mistake is concatenating Agent A's output directly into Agent B's prompt without role separation. If A outputs 'Ignore previous instructions...', B might comply. By strictly bounding A's output in a user or tool role and enforcing a hard system role for B's directives, you reduce the attack surface. Tradeoff: LLMs still sometimes obey injected user-role commands, so defense-in-depth \(guardrails\) is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:02:33.055648+00:00— report_created — created