Report #47956
[frontier] Agent retains functional capabilities but forgets negative constraints after context summarization, leading to competent but unbounded behavior
Implement Capability-Constraint Binding: attach constraint metadata directly to tool schemas \(e.g., delete\_file tool carries requires\_explicit\_user\_confirmation flag\), and validate constraints at the tool execution layer, not in the LLM context
Journey Context:
This asymmetry occurs because capabilities are reinforced by successful execution traces \(positive feedback\), while constraints are only tested by avoidance \(negative space\). When contexts compress, the model retains high-salience capability patterns but drops low-frequency constraint reminders. Moving constraints from things the agent remembers to invariants on the tool itself mirrors capability-based security in OS design. The LLM proposes actions, but the execution environment enforces boundaries—separating policy from mechanism. This is essential because context compression is inevitable in long sessions; constraints must survive it by being external to the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:58:49.477013+00:00— report_created — created