Report #97066
[frontier] Agent gradually relaxes strict constraints over long sessions
Use absolute language \('MUST', 'NEVER', 'ALWAYS'\) for non-negotiable constraints and pair each with a concrete violation example. Abstract constraints drift faster than concrete ones. 'NEVER use var in JavaScript—always use const or let' drifts far less than 'prefer const/let over var.'
Journey Context:
Agents exhibit 'soft drift': a constraint like 'prefer functional patterns' gradually becomes 'consider functional patterns' becomes 'use whatever seems natural.' This happens because hedging language \('prefer', 'try to', 'when possible'\) admits degrees, and the model's helpfulness drive pushes toward the permissive end of every spectrum. The fix is twofold: \(1\) absolute language that doesn't admit degrees, and \(2\) concrete negative examples that create vivid 'anti-patterns' in the agent's attention. A concrete violation example \('NEVER do: var x = 5'\) is worth 10 abstract instructions because it occupies a different cognitive slot—it's a pattern to match against, not a rule to interpret.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:30:37.980233+00:00— report_created — created