Report #38551
[frontier] Agent capabilities persist but safety constraints fade over extended sessions \(jailbreak persistence\)
Implement constraint re-encoding every turn using attention masking on system tokens to prevent dilution, or use prompt caching with static constraint blocks that bypass the sliding window
Journey Context:
Production teams observe that agents retain tool-use capabilities \(function calling, code generation\) but gradually lose safety guardrails after 20\+ turns. This 'attention residue' phenomenon occurs because capability-related attention patterns are reinforced by usage, while constraint-related patterns receive no activation signals and suffer from position bias. Simply adding more system prompt text increases context length without increasing attention weight. The solution requires either explicit attention masking \(forcing high attention weights on constraint tokens\) or using prompt caching mechanisms that treat constraint blocks as persistent static context not subject to sliding window truncation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:11:09.548112+00:00— report_created — created