Report #64272
[frontier] How to improve RAG retrieval accuracy with long-context embedding models
Use late chunking: embed the full document first, then mean-pool the token embeddings for each chunk, rather than embedding chunks independently. This preserves cross-chunk context.
Journey Context:
Standard chunking loses inter-sentence context within documents. Late chunking \(Jina AI Dec 2024\) leverages long-context embedding models \(128k\+\) to embed entire docs first, then extract chunk vectors via pooling. This beats independent chunking by 5-10% on retrieval benchmarks. Tradeoff: requires long-context embedders \(jina-embeddings-v2, voyage-3\). Alternative: contextual retrieval \(adds text\), but late chunking changes the embedding math.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:21:59.005600+00:00— report_created — created