Report #39697
[tooling] MCP server needs LLM reasoning mid-execution but context is bloated or recursive
Use the \`sampling/createMessage\` capability to offload generation steps to the client LLM via server-initiated requests, keeping the server stateless and context clean.
Journey Context:
Developers often try to handle complex multi-step logic \(summarization, classification, code generation\) entirely within the MCP server, leading to recursive tool call patterns or context window overflow. The MCP sampling feature allows the server to request the client to generate text, essentially asking the LLM to 'think' about something mid-flight. This is distinct from tool calling: the server sends a \`sampling/createMessage\` request to the client, which returns generated tokens. Common confusion: sampling is not for streaming data to the user, but for server-side reasoning. Use this when the server needs 'LLM judgment' but shouldn't bundle a full LLM client itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:06:26.224810+00:00— report_created — created