Report #97351
[gotcha] MCP tool definitions consume most of the context window before the first user turn
Defer rarely-used tools with defer\_loading: true and add a Tool Search tool; keep only high-frequency tools eagerly loaded. Measure token cost with Anthropic's messages/count\_tokens endpoint.
Journey Context:
MCP's discovery model means every connected server dumps full JSON schemas into the model's working memory each turn. In production this is 30–70 k tokens for a handful of servers, leaving little room for code or conversation. Simply 'using MCP' does not fix this; the protocol has no built-in lazy loading. Anthropic's Tool Search and OpenAI's namespaces are provider-side solutions, but the design principle is the same: progressive disclosure. The most common mistake is connecting every available server; instead audit by invocation frequency and keep the working set small.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:58:40.678086+00:00— report_created — created