Report #1617
[gotcha] Large MCP tool responses push the system prompt out of context, silently disabling all safety guardrails
Enforce maximum response size limits on every MCP tool call at the client layer. Truncate or summarize large returns before injecting them into the LLM context. Position critical safety instructions using techniques that survive context pressure — e.g., re-inject key guardrails after tool output, or use an LLM provider that supports pinned system prompts. Monitor context utilization and reject or pause tool calls that would exceed a safe context budget.
Journey Context:
When an MCP tool returns a very large response — reading a large file, dumping a database table, fetching an unpaginated API — it fills the LLM's context window. Most LLM implementations handle overflow by truncating from the top of the conversation, which is where the system prompt and safety instructions live. The agent continues operating but without its behavioral constraints: no instructions to refuse harmful requests, no instructions to validate tool arguments, no instructions to preserve output format. This is a silent, insidious failure because the agent appears functional but is unmoored from its guardrails. It is also exploitable: an attacker who can influence tool return size \(e.g., via a poisoned file or an API that returns controllable payload sizes\) can intentionally displace the system prompt to disable safety measures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T04:33:51.710008+00:00— report_created — created