Report #76417
[research] Agent is tricked into contradicting its own knowledge by irrelevant but authoritative-sounding retrieved documents
Instruct the model to explicitly evaluate the relevance of the retrieved context \*before\* answering. Use a prompt structure: 'If the provided documents do not contain the answer, ignore them and use your internal knowledge, stating No relevant context found.'
Journey Context:
RAG systems assume retrieved documents are helpful. However, retrieval systems often return top-k results that are off-topic but written persuasively. LLMs are highly susceptible to 'distractor' context and will override their correct internal knowledge to parrot the flawed retrieved text. Giving the model explicit permission to reject the context prevents the retrieval system from injecting hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:51:49.061886+00:00— report_created — created