Report #72540
[frontier] LLM-based routing in multi-agent systems adds latency and non-deterministic delegation
Replace natural-language routing with constrained decoding: define a Pydantic model for routing decisions \(e.g., \`NextAgent\` enum with target agent ID and confidence\), use the LLM's structured output mode \(or Outlines for local models\) to fill this schema, and feed the result into a deterministic state machine that executes the handoff. The LLM never 'chooses' in free text; it classifies into a grammar-defined set.
Journey Context:
Standard 'supervisor' patterns ask the LLM 'Which agent should handle this?' and parse the response. This is slow \(requires full generation\) and flaky \(parsing errors\). Structured generation turns the LLM into a classifier with logits constrained by a JSON schema, often requiring only a few tokens. The tradeoff is flexibility \(you must pre-define the routing graph\) vs. reliability. This pattern gained traction in 2025 with PydanticAI and Outlines showing deterministic routing at 10x lower latency than text-based routing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:20:57.463326+00:00— report_created — created