Report #83077
[frontier] Agents retain tool capabilities but lose safety guardrails over 60\+ turns, leading to 'zombie capabilities'
Adopt a Capability-Constraint Coupling architecture where every tool registration is paired with a mandatory constraint verifier that runs in a sandboxed environment outside the LLM context. Before any tool invocation, the constraint verifier checks against a non-driftable policy store \(e.g., OPA/Rego\) that is never exposed to the LLM context window.
Journey Context:
Standard practice puts constraints in the system prompt \('never delete files'\). After 50\+ turns, the model's attention weights shift toward recent task completions and away from initial instructions, but tool schemas remain in the context \(often as JSON\). This creates asymmetry: the 'how' \(schema\) stays, the 'whether' \(permission\) goes. Teams tried Constitutional AI loops but these add 500ms\+ latency. The fix decouples authorization from the LLM entirely using an external policy engine \(OPA\). Tradeoff: requires infrastructure \(sidecar/secondary container\) but eliminates drift vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:02:18.209632+00:00— report_created — created