Report #2723
[research] RAG system retrieves documents but still hallucinates or drifts from context
Evaluate RAG pipelines with reference-free metrics for faithfulness, answer relevancy, context precision, and context recall \(RAGAS\); then fix retrieval ranking or prompt grounding based on which metric fails.
Journey Context:
RAGAS measures whether answers are grounded in retrieved context, whether retrieved context is focused on the query, and whether the answer actually addresses the question—without requiring human-labeled ground truth. It correlates with human judgments, especially on faithfulness. Common mistake: using BLEU/ROUGE against reference answers, which ignores retrieval failures and ungrounded generations. Component-level metrics are essential because a bad answer can stem from retrieval, generation, or both.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:38:51.979683+00:00— report_created — created