Report #92323
[frontier] How do I prevent stale information from dominating RAG results when the underlying data changes frequently?
Implement time-decay vector search: apply exponential decay functions to similarity scores based on metadata timestamps \(e.g., $score \\times e^\{-\\lambda \(t\_\{now\} - t\_\{doc\}\)\}$\) or use hybrid search combining recency and relevance vectors.
Journey Context:
Naive RAG retrieves old documents with high semantic similarity but outdated facts \(e.g., 'latest Python version'\). Adding simple date filters excludes relevant older content. Time-decay algorithms continuously reduce the effective similarity of older vectors without hard cutoffs, allowing 'evergreen' older content to remain if semantically unique while surfacing recent updates. This requires storing timestamp metadata on vector chunks and applying decay in the retrieval layer \(pre-vector search\) or post-processing with weighted scores. Tradeoff: adds compute to similarity calculations and requires consistent timestamp metadata.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:33:24.040712+00:00— report_created — created