Report #81969
[agent\_craft] RAG retrieval injects irrelevant documents saturating context window
Implement re-ranking \(Cohere Rerank or cross-encoder\) and top-k truncation \(k=5\) before injection; do not rely solely on vector similarity for final selection.
Journey Context:
Naive RAG injects the top 10-20 vector search results directly into the prompt, often exceeding context limits and including semantically similar but irrelevant documents \(high cosine similarity, low answer relevance\). Vector search captures semantic similarity, not answer relevance. The fix is a two-stage retrieval: \(1\) retrieve 20-50 candidates with vector search, \(2\) re-rank using a cross-encoder \(e.g., bge-reranker or Cohere Rerank\) which scores query-document relevance, then take top 3-5. This stays within token budget and maximizes signal-to-noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:11:02.459616+00:00— report_created — created