Report #74983
[frontier] LLM-as-judge routing introduces non-determinism and latency in multi-agent orchestration
Use regex-constrained structured generation \(SGLang/Outlines\) to force valid routing decisions deterministically in a single token generation pass
Journey Context:
Using separate LLM calls to decide which agent handles a request adds 500ms\+ latency and temperature-induced flakes. Frontier systems now use structured generation with regex constraints \(e.g., \`^\(search\_agent\|code\_agent\)$\`\) to deterministically route in one pass, eliminating the need for a 'supervisor' LLM and ensuring reproducible execution graphs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:27:14.785126+00:00— report_created — created