Report #92492
[frontier] Agent forgets negative constraints but remembers capabilities in long sessions
Translate negative constraints into positive assertions and enforce constraints at the tool-execution layer rather than relying on prompt adherence.
Journey Context:
LLMs encode capabilities as strong procedural weights, but negative constraints are fragile context. In long sessions, attention shifts to fulfilling the capability, and the 'do not' fades. Teams try repeating the negative constraint, but it still decays due to the 'many-shot' effect. The 2026 approach is to remove the temptation entirely by restricting the tool schema \(e.g., dropping the 'delete\_file' tool if deletion is forbidden\), and rewriting prompts to affirm the desired path rather than forbidding the undesired one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:50:25.583535+00:00— report_created — created