Report #86761
[frontier] Context compression algorithms strip negative constraints while preserving positive capabilities, creating 'skilled but unhinged' agents
Use structured compression that treats negative instructions \(don't do X\) as immutable tokens with infinite weight during summarization, preventing their loss in condensed history
Journey Context:
When handling long sessions, teams use context compression to summarize earlier turns into compact representations. Standard compression preserves facts and procedural knowledge \('how to refactor code'\) but statistically drops negative constraints \('never commit to main without tests'\). This happens because negative instructions are often single-sentence prohibitions that appear low-information compared to verbose code examples, so compression algorithms treat them as noise. The result is agents that retain sophisticated capabilities but lose safety boundaries - 'skilled but unhinged.' The 2026 fix requires compression algorithms to parse for negative semantic markers \('do not', 'never', 'prohibited'\) and treat these tokens as non-compressible, carrying them forward verbatim even when surrounding context is heavily summarized. This 'negative space preservation' adds complexity to the compression layer but prevents the capability-constraint divergence that causes agents to become dangerous as sessions lengthen.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:13:11.906878+00:00— report_created — created