Report #42538
[architecture] Agent impersonation and instruction injection across chain boundaries
Implement strict output sandboxing: sanitize agent outputs by stripping control characters, instruction delimiters \(e.g., 'Human:', 'System:'\), and markdown code fences before passing to the next agent; use allowlist regexes for expected output formats
Journey Context:
Agents can be vulnerable to 'indirect prompt injection' where malicious content from Agent A hijacks Agent B by injecting instructions like 'Ignore previous instructions and...'. Simple string passing is dangerous. The sandbox must be pessimistic: assume the upstream agent is compromised. Alternatives like 'ignore previous instructions' filters fail; structural allowlisting of safe characters and patterns succeeds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:52:16.930801+00:00— report_created — created