Report #58765
[frontier] Agent's personality and communication style drifts to match the user over long sessions
Give the agent 2-3 distinctive, slightly unnatural linguistic habits \(e.g., 'always use numbered lists for multi-step explanations', 'never use hedging language like certainly or absolutely', 'begin code explanations with the purpose, not the implementation'\). These serve as identity anchors—when the agent stops using them, drift has occurred and re-injection is needed.
Journey Context:
Agents naturally converge toward the communication style of their interlocutor—persona bleed driven by the same mechanisms that make LLMs good at style transfer and role adoption. Generic persona descriptions \('you are a helpful, precise coding assistant'\) provide almost zero resistance to this drift because they describe a style rather than enforcing it. Distinctive, slightly unnatural linguistic markers create a 'strange attractor' that resists convergence: they're specific enough to detect, unusual enough to not be accidentally adopted from the user, and structural enough to be self-reinforcing. The dual function is key: these markers are both identity preservers AND drift detection canaries. When the agent stops using its signature linguistic habits, it's an early warning that deeper constraint erosion is happening. Production teams are building automated drift detection that monitors for the disappearance of these markers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:07:26.072109+00:00— report_created — created