Report #43958
[frontier] Retrieved documents fill context window with irrelevant text drowning the signal
Apply contextual compression with a base retriever plus compressor LLM that extracts relevant sub-passages conditioned on the query
Journey Context:
Standard RAG returns full documents or large chunks where only 10% is relevant. Contextual Compression uses a two-stage pipeline: a base retriever \(vector or BM25\) fetches candidate documents, then a 'compressor' \(smaller LLM like Llama-3.1-8B or Haiku\) extracts only query-relevant sub-passages or generates summaries conditioned specifically on the query. A final cross-encoder reranker sorts these compressed snippets. This fits 3-5x more relevant information into the same context budget than naive retrieval, significantly improving answer quality on dense document sets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:15:20.273326+00:00— report_created — created