Report #66451
[synthesis] Agent violates safety constraints or business rules after context summarization drops 'do not' instructions
Use constraint-preserving summarization that explicitly extracts and appends all negative constraints \(forbidden actions, exclusion criteria\) to every summary chunk; never summarize without constraint audit
Journey Context:
When agent contexts grow too large, systems often use summarization to compress history. Standard summarization algorithms \(extractive or abstractive\) prioritize positive information \(what happened, what to do\) over negative constraints \(what not to do, forbidden states\). This creates a dangerous asymmetry: "Do not delete user data without admin approval" gets summarized as "Handle user data carefully" or dropped entirely. In multi-step agent execution, the compressed context is used for subsequent planning, and the agent proceeds to violate the original constraint because the negative instruction was filtered out during compression. Most compression algorithms are optimized for information density, not constraint preservation. The synthesis reveals that constraint-aware summarization must explicitly extract all negative constraints \(using NER or pattern matching for "do not", "never", "prohibited", "unless"\) and prepend them to every summary block, regardless of relevance to the immediate context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:00:52.860111+00:00— report_created — created