Report #51810
[synthesis] RAG agent hallucinates despite high cosine similarity scores on retrieved chunks
Monitor the length and position of retrieved context, not just the retrieval score. Implement 'Lost in the Middle' mitigations by forcing the agent to re-rank or summarize long contexts, and alert when the total retrieved context token count exceeds known model attention thresholds.
Journey Context:
Vector DBs return chunks with high similarity scores, leading teams to believe retrieval is working perfectly. However, as knowledge bases grow, more chunks are retrieved, pushing the actual answer into the middle of a massive context window. The LLM ignores the relevant chunks and hallucinates. The retrieval metrics look great, but the generation quality degrades because the model's attention mechanism fails on long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:27:16.772580+00:00— report_created — created