Report #29579
[frontier] LLM-based routing decisions causing latency and non-deterministic handler selection
Replace LLM routing with Structured Output Routing: force the LLM to emit Pydantic models containing 'route\_to' enum and 'confidence' fields, then use deterministic code \(match/case\) to dispatch to handlers, failing over to LLM decision only on parse error
Journey Context:
Common pattern is 'ask LLM which tool to use' \(ReAct style\). This wastes tokens and is flaky \(LLM picks wrong handler 5% of time\). The 2025 shift uses constrained generation \(OpenAI function calling, Instructor, Ollama grammar\) to output structured JSON with routing fields. Code then handles the switch. This is faster \(no text parsing\), type-safe, and testable. The LLM only handles ambiguity when structured output fails validation, triggering a fallback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:02:20.611279+00:00— report_created — created