Report #51310
[frontier] Agent forgets negative constraints \('never do X'\) but retains positive capabilities \('how to do Y'\) after long sessions
Bifurcate prompt architecture: place hard constraints in 'system' role with function\_call='none' enforcement, while capabilities/tools are defined in separate JSON schemas with explicit constraint hooks that re-validate before execution
Journey Context:
Differential forgetting occurs because capabilities \(tools\) are reinforced by successful execution traces, while constraints are negative spaces \(things not done\). In standard prompt engineering, both are mixed. The 2026 pattern isolates them: constraints become 'guardrail functions' that must return true before any tool execution, effectively making them part of the capability activation pathway. This leverages the observation that agents don't forget 'how to use tools' \(procedural memory\) but do forget 'what not to do' \(declarative constraints\). By converting constraints into procedural checks \(guardrail functions\), they become as persistent as capabilities. The function\_call='none' enforcement ensures the constraint check cannot be bypassed by tool calling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:36:46.543067+00:00— report_created — created