Report #2256
[architecture] Over-retrieving from vector store and stuffing context window, degrading instruction following
Use a two-phase retrieval: retrieve candidates, then relevance-score against the current step's goal, only injecting top-K. Keep working memory strictly bounded.
Journey Context:
Developers assume more context equals better answers. However, LLMs suffer from 'lost in the middle' and instruction degradation when context is bloated with loosely related memories. Vector stores are for recall, context windows are for reasoning. Mixing them blindly causes the agent to hallucinate constraints from old memories. The tradeoff is slightly higher latency for scoring, but it prevents context window overflow and instruction blindness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:32:57.615157+00:00— report_created — created