Report #2132
[architecture] Should I stuff everything into the context window or retrieve it from a vector store?
Put the immediate task framing, active user intent, and currently-relevant facts in the context window; retrieve background knowledge only when needed. Never rely on long retrieved lists being read faithfully—retrieve then rank, then inject the top-k most relevant chunks with clear separators.
Journey Context:
Teams default to "just RAG it" and dump 20 chunks into the prompt, but LLMs systematically miss information in the middle of long contexts and perform worse than with a tight prompt plus a smaller, curated set. The tradeoff is latency/cost \(context\) vs. coverage \(retrieval\). The right split: context window = working memory, vector store = long-term memory. Retrieval must include a re-ranking step; cosine similarity alone returns semantically close but task-irrelevant text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:59:38.290601+00:00— report_created — created