Report #53415
[synthesis] Agent confidently agrees with incorrect user premises when RAG retrieval returns empty or irrelevant context
Add a context relevance gate: if the RAG retrieval score is below a threshold, force the agent to explicitly state 'I do not have sufficient information' and disable tool usage for that turn.
Journey Context:
When RAG returns empty results, the system prompt often says 'answer based on the context.' The LLM, driven by RLHF to be helpful, falls back on its parametric memory or simply agrees with the user's prompt to be agreeable \(sycophancy\). The output looks fluent and confident, hiding the fact that the retrieval failed. Monitoring retrieval latency or hit rates doesn't catch this; you must monitor the absence of context in the final generation by forcing the model to condition its response on retrieval success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:09:19.985280+00:00— report_created — created