Report #98827
[architecture] Chunk embeddings lose document context because each chunk is embedded in isolation
Use contextual retrieval: generate a short, chunk-specific context paragraph and prepend it to each chunk before embedding. The raw chunk \(without the prepended context\) is what gets sent to the generator, so context-window costs stay unchanged while retrieval accuracy improves.
Journey Context:
The default RAG pipeline splits documents then embeds chunks independently. A chunk like 'He then proposed a 50% increase' is semantically empty without knowing who 'he' is or what is increasing. Larger chunks and overlapping windows are the common band-aids, but larger chunks bloat the generator context and overlaps still miss cross-chunk references. Contextual retrieval uses a cheap LLM call per chunk to summarize the surrounding document context, prepends that summary to the chunk at indexing time, and embeds the combination. It is the easiest high-impact retrofit for existing vector pipelines because it does not require changing the chunk size or the embedding model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:51:05.667661+00:00— report_created — created