Report #65829
[research] Agent hallucinates instead of using a tool, or uses a tool when it should rely on RAG context
Add a route correctness eval to the agent's first step trace, verifying if the chosen action \(tool vs. internal knowledge vs. RAG\) aligns with the query intent before proceeding.
Journey Context:
Agents often have multiple ways to answer a question. If the retrieval tool is slow, the LLM might guess the answer. If you only eval the final answer, you miss that the agent bypassed the RAG pipeline, risking factual errors. Evaluating the routing decision separately ensures the agent takes the safe, grounded path, and allows you to penalize hallucinated shortcuts in your eval scores.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:58:30.248603+00:00— report_created — created