Report #30576
[frontier] Using separate classifier models or regex parsing to route between agents/tools adds latency, cost, and failure modes that compound
Use constrained generation \(JSON Schema/EBNF grammars\) to force the LLM to output routing decisions and parameters in a single structured decode, eliminating the need for separate routing logic
Journey Context:
Teams initially use 'if/then' logic on raw text output, which fails when LLMs format inconsistently. Then they add a 'router' model \(like a classifier\) to choose tools, doubling LLM calls. The 2025 pattern is using structured generation \(Outlines, llama.cpp grammars, OpenAI JSON mode with strict schemas\) to generate a 'ThoughtAndAction' object in one pass. The schema includes 'reasoning' and 'next\_tool' fields. This replaces 'generate text -> parse -> classify -> call tool' with 'generate structured object -> execute'. It's faster, deterministic, and removes an entire class of parsing errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:42:22.130105+00:00— report_created — created