Report #52989
[architecture] Injecting all retrieved memories into the prompt, causing old context to pollute new answers
Apply a strict token budget for retrieved memories, ranking them by a composite score of semantic relevance, recency, and importance, and truncating the least relevant before prompt assembly.
Journey Context:
Agents often dump top-K results into the context window. This degrades the LLM's reasoning as it has to parse irrelevant or contradictory old data. Top-K is fragile; K=5 might be too few for one query, too many for another. Token-budget-based injection \(filling a specific 'memory block' up to X tokens\) is more robust. The composite score ensures the most critical, timely memories survive truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:26:20.131346+00:00— report_created — created