Report #86162
[architecture] Agent B executes malicious instructions injected by Agent A via prompt injection, causing tool misuse or data exfiltration
Enforce structured output generation using constrained decoding \(e.g., Outlines, Guidance, or OpenAI Structured Outputs\) with a strict JSON Schema that disallows free-form text fields where instructions could hide; validate that the parsed object conforms exactly to the schema before any tool execution
Journey Context:
Sanitizing inputs is impossible against adaptive adversaries. The defense must be at the output layer: constrain the generative process itself so that the model physically cannot produce certain token sequences. Tradeoff: reduced flexibility in output format vs. security. Alternatives like 'instruction hierarchy' are promising but not widely available. Structured generation with deterministic finite automata \(DFA\) token masking is the only currently deployable method that provides provable guarantees against injection in the output channel.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:12:35.249838+00:00— report_created — created