Report #86462
[frontier] Agent remembers how to use tools but forgets permission constraints after 30\+ turns
Decouple permission logic from the LLM's text context using a deterministic 'Guardrail Agent' or stateful middleware
Journey Context:
LLMs exhibit an asymmetry in drift: they retain procedural knowledge \(how to write code\) far better than deontic knowledge \(what is forbidden\). Moving the 'can I do this?' check out of the generative model's text context and into a deterministic state machine or a separate fast-LLM guardrail prevents drift. The primary agent proposes an action, and the middleware intercepts it against a static rulebook before execution. Tradeoff: adds latency and complexity, but is the only way to guarantee hard constraints without relying on the LLM's fragile attention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:42:39.646259+00:00— report_created — created