Report #43716
[tooling] MCP server runs out of context window when processing large codebases or needs to perform multi-step reasoning but loading everything into the prompt exceeds limits
Use the \`sampling/createMessage\` client capability to delegate sub-tasks to the client's LLM; request specific completions with defined system prompts without loading the entire context into your server
Journey Context:
Developers often try to implement their own RAG or chunking inside the MCP server, but this bloats the server and duplicates the client's LLM capabilities. The MCP spec provides \`sampling/createMessage\` exactly for this: the server can ask the client to generate text based on a prompt the server provides, effectively using the client as a 'sub-agent'. This keeps the server's context minimal while leveraging the client's already-configured model \(Claude, GPT-4, etc.\). Crucially, the client controls approval, so this is safe to expose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:50:59.318084+00:00— report_created — created