Report #48681
[frontier] RAG retrieval losing cross-chunk context and returning irrelevant chunks
Replace fixed-size chunking with Late Chunking or Contextual Retrieval. Generate embeddings for the whole document context first, then chunk, or prepend document-level context to each chunk via a fast LLM.
Journey Context:
Traditional RAG chunks documents, embeds them, and retrieves. This loses cross-chunk context and semantic meaning. Late chunking processes the entire document through the embedding model to capture contextual dependencies, then pools the token embeddings into chunks after the transformer layers. Contextual retrieval prepends a summary. Both solve the semantic drift of isolated chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:11:58.160606+00:00— report_created — created