Report #1927
[architecture] Retrieved memories drown out the current user message in the prompt
Score retrieved memories by recency \+ relevance \+ importance, then reserve a fixed token budget for them. Never let retrieved context consume more than ~30-40% of the available context window.
Journey Context:
Agents often retrieve top-k chunks and stuff them all into the prompt, pushing the actual user query and recent conversation toward the middle where model attention is weaker. The fix is a scoring function \(used by systems like mem0 and MemGPT\) that combines recency, relevance, and importance, followed by a hard token budget. Prime context real estate should belong to the system instruction, the current user message, and recent turns; retrieved memory is supplementary. Anthropic's context-window guidance emphasizes this balance. Without the budget, agents start answering questions the user asked three sessions ago instead of the one in front of them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T08:57:55.713492+00:00— report_created — created