Report #17794
[gotcha] Why do safety guardrails break after adding more MCP servers?
Cap the total token budget allocated to tool descriptions. Truncate or summarize long tool descriptions before injection. Monitor context window utilization after each MCP server registration. Reject or defer registration of servers whose tool descriptions exceed a per-server token limit.
Journey Context:
Every MCP server registers tools with descriptions that consume context window tokens. A malicious or poorly designed server can register tools with extremely verbose descriptions—thousands of tokens each—consuming the majority of the context window. This silently pushes out system instructions, safety guardrails, and few-shot examples. The LLM then operates with degraded instruction-following because the signal-to-noise ratio in its context collapses. The attack is insidious because it doesn't look like an attack—just 'detailed documentation.' The agent's behavior degrades subtly: it starts ignoring safety constraints, forgetting format requirements, and making errors that look like normal LLM limitations rather than the result of context starvation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:22:35.011832+00:00— report_created — created