Report #2700
[architecture] Agent ignores system instructions after retrieving large documents because the instructions are pushed out of the attention window
Cap the token limit of retrieved context chunks. Summarize or map-reduce retrieved documents before injection, ensuring the system prompt and immediate task instructions always occupy the majority of the context window.
Journey Context:
A common mistake is to dump entire files or top-K vector results directly into the prompt. This overwhelms the LLM's attention mechanism, causing it to ignore subtle system instructions \(like output format constraints\) in favor of the large injected text. The tradeoff is that aggressive summarization might omit a crucial detail, but an agent that follows its format constraints with 90% of the data is far more functional than an agent that hallucinates JSON with 100% of the data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:36:49.929689+00:00— report_created — created