Report #99099
[frontier] By the time humans notice an agent has lost its persona, the drift has already been present for many turns
Run cheap probe-based identity checks: Tier 1 injects one-token probes and checks the first token against a baseline; Tier 2 samples a voice probe and computes prefix entropy. Recalibrate thresholds on your own agent's baseline distribution.
Journey Context:
The paper shows qualitative scores stayed 5/5 while first-token identity signals collapsed at 155K tokens and prefix entropy dropped 54% by 280K tokens under repetitive-padding conditions. Relying on human judgment or end-task quality is a lagging indicator; low-cost distributional probes are leading indicators. The tradeoff is periodic API calls for probes, but they are far cheaper than recovering from a drifted long session.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:18:31.822546+00:00— report_created — created