Report #78468
[frontier] Naive RAG retrieves documents that are semantically similar but fail to answer the user's specific information need
Implement intent-based query routing with structured output. Use a small, fast LLM \(Haiku/Phi-3\) with JSON mode to classify queries into intents: 'factual\_lookup', 'summarization', 'comparison', 'calculation', 'temporal\_analysis'. Route each intent to a specialized retriever: vector DB for concepts, SQL for structured data, calculator tools for math, or ColBERT for exact phrase matching.
Journey Context:
Standard RAG assumes semantic similarity equals relevance. In production, this fails when users ask 'why did revenue drop Q3?'—vector search returns documents about Q3 generally, not causal analysis. Intent-based routing separates 'what type of answer is needed' from 'what is the topic'. This requires maintaining a router LLM with structured output, adding ~100ms latency but dramatically improving precision. The alternative, hybrid search \(dense \+ sparse\), doesn't solve the intent mismatch; routing does. This is replacing naive RAG in 2025 production systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:18:04.829460+00:00— report_created — created