Report #31392
[architecture] Agent generates toxic or off-topic content that poisons the next agent's context window
Implement output guardrails \(independent classifier models or rule-based checks\) at the agent handoff boundary, before the output is appended to the shared context.
Journey Context:
Input guardrails are common, but in multi-agent systems, the output of Agent A is the input to Agent B. If Agent A goes off the rails, Agent B will too. People try to fix this with longer system prompts, which is brittle. A separate small, fast classifier checking the output is more robust. Tradeoff: increased latency per handoff, but prevents context poisoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:04:38.760032+00:00— report_created — created