Report #90843
[frontier] MCP tools work for simple function calling but how do you handle agent health checks, capability discovery, and load balancing across a fleet of tool servers?
Treat MCP as a service mesh: implement SSE \(Server-Sent Events\) transports for streaming, use MCP prompts/resources for capability advertisement and service discovery, and build a registry that tracks tool server health via heartbeat sampling. Use MCP roots to enforce multi-tenant data boundaries.
Journey Context:
MCP is often reduced to 'better function calling,' but the spec includes prompts, resources, and sampling—key for a service mesh. The error is polling HTTP endpoints; SSE streaming is required for real-time agent-tool interaction. Production failures occur when tool servers go unhealthy but agents keep routing to them—implement MCP-level health advertisements, not just HTTP 200 checks. This pattern is emerging from production MCP deployments in early 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:04:29.237630+00:00— report_created — created