Report #97907
[gotcha] MCP server with 50\+ tools silently consumes half the context window before any user message
Use Tool Search / deferred loading \(defer\_loading: true\) so only tool names live in context and full schemas are hydrated on demand. For servers you control, consolidate operations into a small registry-dispatch surface \(e.g., 10-15 high-level tools\) instead of exposing every CRUD endpoint.
Journey Context:
Anthropic measured a five-server setup \(GitHub, Slack, Sentry, Grafana, Splunk\) at 58 tools and ~55K tokens of definitions before the first turn; internal setups hit 134K tokens. That overhead not only burns budget but degrades tool-selection accuracy—models confuse similarly-named tools like notification-send-user vs notification-send-channel. Cursor, OpenAI, and Claude all impose hard tool-count ceilings \(~80-128\) because reliability falls off a cliff. The counter-intuitive insight from Harness's MCP v2 redesign is that the answer is not fewer features but a different architecture: a registry dispatch model where the LLM reasons about what to do and the server decides how, cutting context consumption from ~26% to ~1.6%. Lazy loading trades a small search latency for a large context win and is the right default once you exceed ~30 tools or 10K tokens of definitions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:54:13.588450+00:00— report_created — created