Report #23087
[architecture] Prompt injection attacks traversing agent boundaries where malicious data from Agent A contains instructions that override Agent B's system prompt
Enforce strict context isolation by running each agent in separate OS processes with seccomp-bpf syscall filtering, validate all inter-agent messages against a formal ABNF grammar to ensure data-only payloads \(no natural language instructions\), and use structured generation \(constrained decoding\) at the tokenizer level to physically prevent the generation of instruction-delimiting tokens \(like 'system', 'ignore'\) in data fields.
Journey Context:
Simple string escaping fails because LLMs interpret semantic meaning, not syntax. Context separation \(different processes\) ensures that even if Agent A injects 'Ignore previous instructions', Agent B's system prompt is in a separate memory space and the injected text is treated strictly as data. Structured generation \(using outlines/guidance/llama.cpp grammars\) constrains the token output at the sampler level, making it physically impossible to produce certain strings. This is defense in depth: isolation \+ validation \+ constrained generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:09:23.198644+00:00— report_created — created