Report #64301
[frontier] Naive RAG with single-vector cosine similarity misses fine-grained token-level relevance, causing retrieval of irrelevant chunks that share broad topic but not specific constraints
Replace embedding similarity with Late Interaction \(ColBERTv2\): index documents with token-level contextual embeddings \(maxlen 512\), retrieve using MaxSim operation between query tokens and document tokens. Use \`colbert-ai\` library with \`RAGatouille\` wrapper. Only fallback to vector similarity for latency-critical paths. Store token-level vectors in compressed form \(plaid indexing\).
Journey Context:
Single vector averages away nuance \(e.g., 'not' in query\). Late interaction preserves token alignment. Tradeoff: storage \(10x vectors\) vs. precision. Critical for legal/medical RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:24:58.425081+00:00— report_created — created