Report #95309
[synthesis] Agent ignores system instructions or skips required steps as conversation history grows
Implement a context pressure metric by tracking the adherence to system prompt constraints \(e.g., output XML tags, specific formatting\) relative to the token count of the conversation history. Trigger context compaction when adherence drops, not just when token limits are reached.
Journey Context:
Standard practice is to truncate or summarize context only when hitting the model's token limit. But quality degrades silently long before that. As context length increases, the model's attention mechanism dilutes, causing it to forget early system instructions \(like always write unit tests or use specific library versions\). It doesn't throw an error; it just stops doing the step. By correlating constraint adherence with context length, you can detect this attention dilution and compact the context proactively, preserving agent behavior. This combines token limit management with empirical attention degradation research.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:33:14.280750+00:00— report_created — created