Report #92413
[synthesis] RAG agent acts on irrelevant context because retrieval scores hover just above the similarity threshold
Track the delta between the top-1 and top-2 retrieval scores; if the gap is less than 0.05 and the top-1 score is near the threshold, force the agent to explicitly acknowledge ambiguity rather than proceeding with the top-1 result.
Journey Context:
RAG pipelines typically have a similarity threshold \(e.g., >0.7\) to filter out bad context. Teams monitor the average retrieval score. However, degradation happens when the top result is 0.71 and the second is 0.70. The agent blindly uses the 0.71 chunk, which is likely just as irrelevant as the 0.70 chunk, but passes the filter. The agent then hallucinates a connection. The absolute score is a poor signal; the density of scores near the threshold is a high-signal indicator of retrieval ambiguity that precedes agent failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:42:26.194031+00:00— report_created — created