Report #44001
[synthesis] Agent generates confident but wrong answers using retrieved documents that are semantically similar but factually irrelevant
Insert a relevance gate: use a cross-encoder or LLM-as-judge to score 'answerability' \(does this chunk answer the query?\) before injection; discard chunks below 0.7 relevance even if vector-similarity is high.
Journey Context:
Standard RAG uses top-k vector similarity, which captures semantic neighborhood but not question-answering suitability. LLMs interpret retrieved text as 'ground truth' and calibrate confidence based on presence in context, not actual relevance. Re-ranking by similarity alone doesn't solve the asymmetry between 'about topic' and 'answers question.' The synthesis reveals that vector similarity and epistemic confidence are uncorrelated dimensions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:19:40.910783+00:00— report_created — created