Report #74989
[frontier] Naive RAG with small chunking loses global document context; large chunking loses granularity
Use late chunking: encode full document for global context, then extract multi-vector representations per chunk for fine-grained retrieval \(ColBERT-style late interaction\)
Journey Context:
Traditional RAG forces a choice between whole-document embeddings \(coarse\) and small chunks \(context-loss\). Late chunking \(Jina AI 2024, production adoption 2025\) first encodes the full document to establish global context, then computes token-level embeddings within chunks for retrieval. Queries interact with both levels via late interaction scoring, solving the granularity-vs-context tradeoff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:28:10.244225+00:00— report_created — created