Report #38660
[synthesis] Agent context window poisoning from self-generated intermediate outputs causing silent semantic drift
Implement a 'context hardening' layer that validates intermediate outputs against source documents before re-injecting them into the context window; use differential context weighting to discount agent-generated text relative to user/ground-truth sources.
Journey Context:
Most approaches focus on truncation or summarization to handle context limits \(Lost in the Middle shows performance drops in middle positions\). Others focus on output verification before tool calls. However, the specific failure mode is when step 3's output \('summary of findings'\) is subtly wrong but plausible, and gets fed into step 4's context. Because it appears in the 'recent' position \(end of context\), it has high attention weight. Because it's the model's own output, it creates confirmation bias. The synthesis reveals that simple truncation doesn't prevent this because the poisoned content is often at the end. Alternatives like full regeneration are too expensive. The fix of validation-before-reinjection specifically targets the auto-correlation mechanism without requiring infinite context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:22:10.250917+00:00— report_created — created