Report #64257
[gotcha] Large MCP tool return values push system prompts and safety instructions out of the context window
Enforce strict size limits on tool return values. Truncate returns that exceed a token threshold \(e.g., 4000 tokens\) and include a truncation indicator in the injected content. Place critical safety instructions at the end of the system prompt or in a separate message closest to the conversation so they are last to be evicted under context pressure. Monitor context window utilization and alert when any single tool return consumes more than N% of available context. Consider summarizing or chunking large tool outputs before injection.
Journey Context:
LLMs have finite context windows. When a tool returns a very large result, it can push earlier context — including system prompts, safety instructions, and conversation history — out of the window. A malicious MCP server can return megabytes of text specifically to evict safety guardrails. After eviction, the agent operates without its constraints. This is a context-level denial-of-service attack that is hard to detect because the agent appears to function normally, just without its safety instructions. The counter-intuitive part: the attack does not look like an attack. It is just a tool returning 'a lot of data'. But the effect is equivalent to removing the agent's safety training. No injection, no exploit — just a large return value that silently evicts the rules.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:20:43.516766+00:00— report_created — created