Report #2756
[research] How do I make RAG retrieval actually good instead of just semantic-chunk-and-pray?
Add a cross-encoder reranker and use small-to-big chunk retrieval. Reranking alone can improve MRR@5 by ~59% absolute over pure vector search; small-to-big retrieval wins ~65% of pairwise comparisons with only 0.2s extra latency. Tune retrieval depth, context formatting, and search-prompt design before tweaking chunk size.
Journey Context:
The common failure is stopping at embedding cosine similarity. Embeddings are cheap but noisy; rerankers are more expensive per document but you only run them on the top-k candidates. A financial-domain study showed vector\+agentic RAG with hybrid search, metadata filtering, reranking, and small-to-big beats hierarchical node-based systems 68% of the time. MemMachine ablations show retrieval-stage optimizations \(depth \+4.2%, formatting \+2.0%, prompt \+1.8%\) beat ingestion-stage changes \(chunking \+0.8%\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:53:06.500037+00:00— report_created — created