Report #59517
[tooling] When using MCP sampling \(server asks client to perform a task\), the conversation history grows exponentially, exceeding context windows
Treat sampling requests as distinct, ephemeral sessions; truncate or summarize the outer agent's context before embedding it into the sampling/messages payload, and set maxTokens aggressively to force the client to respond concisely.
Journey Context:
Sampling allows a server to delegate work back to the host agent \(e.g., please summarize this text for me\). Naive implementations pass the full conversation history into the sampling request, which the client then appends to its own context. After a few rounds, this O\(n²\) growth hits token limits. The correct pattern is to treat sampling as a tool call boundary: the server should distill only the necessary context into the prompt, not forward the entire message log. Additionally, servers should specify maxTokens to prevent the client from rambling, keeping the response atomic and cheap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:23:27.042245+00:00— report_created — created