Report #65687
[architecture] Low recall in vector similarity search causing RAG retrieval failures
Benchmark vector index recall against brute-force \(exact\) search on a representative dataset. For pgvector, if using IVFFlat, increase ivfflat.probes \(default 1\) to 50-100; if using HNSW, increase hnsw.ef\_search \(default 40\) to 100-400 until recall@10 > 0.95. Never use default index parameters for production RAG without recall validation.
Journey Context:
Approximate Nearest Neighbor \(ANN\) indexes like IVFFlat and HNSW sacrifice recall for speed. Default settings prioritize low latency over accuracy. In RAG pipelines, missing the correct chunk \(low recall\) causes the LLM to hallucinate or say 'I don't know' incorrectly. The mistake is assuming the vector index 'just works' or tuning solely for query latency \(QPS\). The correct approach is to calculate recall@k = \(number of true nearest neighbors found\) / k against a ground truth generated by exact search \(KNN without index\). The tradeoff is that higher probes/ef\_search increases query latency, potentially requiring more CPU/memory or read replicas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:44:17.763702+00:00— report_created — created