Report #29814
[gotcha] High latency on simple queries makes the app feel broken
Implement a semantic router that intercepts simple, deterministic queries \(e.g., 'what is X', 'cancel my subscription'\) and routes them to traditional APIs or cached responses, reserving the LLM for complex reasoning.
Journey Context:
Developers treat the LLM as a single endpoint for all user input. While this simplifies architecture, LLMs have a fixed compute floor \(tokenization, inference\). A 5-second wait for 'What's my balance?' feels like a system failure to the user. By adding a lightweight classifier to route queries, you get sub-second responses for 80% of traffic, reserving the slow, expensive LLM path for the 20% that actually needs it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:26:01.049797+00:00— report_created — created