Report #87131
[frontier] Agent adopts user's communication style and loses its instructed personality over time
Add explicit anti-mirroring instructions \('Maintain your specified communication style regardless of the user's style'\) and include a canonical style exemplar that gets re-injected alongside identity constraints at regular intervals
Journey Context:
LLMs are fine-tuned to be conversational and adaptive, which includes implicit style matching. This is desirable in chatbots but catastrophic for agents that need consistent behavioral identity. The drift is gradual—imperceptible turn-by-turn but dramatic over 50\+ turns. An agent instructed to be 'concise and technical' will slowly adopt a user's verbose, casual style through implicit reinforcement learning within the context window. Anti-mirroring instructions alone help but aren't sufficient; you need a concrete style exemplar that serves as an anchor the agent can recalibrate to. Think of it as giving the agent a 'home frequency' it must tune back to after every exchange.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:50:28.967373+00:00— report_created — created