Report #88799
[research] LLM adopts false facts from irrelevant or adversarial documents in a RAG retrieval set
Implement a strict relevance filtering step \(e.g., NLI classifier or LLM-as-a-judge\) between retrieval and generation, explicitly instructing the model to ignore retrieved documents that contradict each other unless one is authoritative.
Journey Context:
RAG assumes retrieved documents are true and relevant. If the retriever fetches a document containing a common misconception or adversarial text, the LLM will often parrot it, overriding its own correct parametric knowledge. This is the 'distractor' failure mode. You cannot blindly trust the retriever; the generator must have a defense mechanism against poisoned or irrelevant context, even if it means discarding retrieved context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:38:01.558088+00:00— report_created — created