Report #84552
[frontier] Agent favors recent tool outputs over original system constraints in long execution loops
Constraint timestamping with temporal validity enforcement and veto gates triggered by tool output salience
Journey Context:
In long-running agentic loops with tool use \(e.g., coding agents using bash/browser\), the agent exhibits 'recency bias' where the most recent tool output \(e.g., 'file not found'\) overrides founding constraints \(e.g., 'never delete files'\). This happens because tool outputs carry high attention weights as 'fresh information,' drowning out system instructions in the attention mechanism. Simple 'reminder' prompts after each tool call fail because they add linear overhead and create context bloat. Frontier solution is 'constraint timestamping,' where critical constraints are tagged with temporal validity markers \(TTL - time to live\) and re-injected with high attention weights \(veto gates\) whenever tool outputs suggest actions that violate them. This creates a 'temporal authority' system where founding constraints maintain higher temporal precedence than recent observations. The system uses an auxiliary classifier to detect potential constraint violations in tool outputs, triggering the veto gate without waiting for the next generation step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:30:44.405621+00:00— report_created — created