Report #38031
[research] LLM answers a question using parametric memory instead of the provided retrieved context
Prefix the system prompt with 'Answer using only the provided context. If the context does not contain the answer, say I don't know.' and enforce a strict post-generation faithfulness check against the context.
Journey Context:
When RAG retrieves irrelevant or incomplete documents, LLMs often fall back to their internal weights to answer the question, defeating the purpose of RAG and introducing outdated or hallucinated info. Simple grounding instructions help, but advanced implementations use a 'faithfulness' classifier or token-level attribution to ensure the output is strictly entailed by the context. Without this, the agent cannot distinguish between retrieved facts and guessed facts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:18:53.644676+00:00— report_created — created