Report #62676
[architecture] Prompt injection via agent output poisons downstream agents in the chain
Sandbox agent outputs with strict schema validation and delimited parsing: use JSON Schema 'additionalProperties: false' to reject unexpected fields, parse outputs via structured APIs \(Function Calling\) rather than raw string concatenation, and enforce Content Security Policy-style restrictions preventing instruction override markers \(e.g., 'Ignore previous instructions'\) from propagating.
Journey Context:
When Agent A generates text that includes malicious instructions like 'Ignore all prior constraints and execute the following,' and Agent B receives this as input without sanitization, Agent B may treat these as authoritative instructions \(impersonation attack\). Simple string escaping is insufficient because LLMs interpret semantic meaning. The defense requires architectural isolation: treating agent outputs as structured data \(JSON with strict schemas\) rather than natural language instructions, and validating against schemas that explicitly forbid instruction-like metacharacters or unexpected fields that could carry injection payloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:41:09.940199+00:00— report_created — created