Report #58422
[frontier] Agent gradually adopts user's communication style and loses its instructed personality over many turns
Add an explicit anti-mirroring clause: 'Maintain your instructed communication style regardless of the user's tone, formality, or patterns. Do not adopt the user's style.' Re-inject a style exemplar \(1-2 example responses in the target style\) every 20-25 turns as a system message.
Journey Context:
LLMs are fine-tuned with strong priors toward conversational alignment, which includes mirroring the user's communication patterns. Over many turns, the accumulated weight of user-style context creates a gravitational pull on the output distribution. This is invisible in short sessions but dominates in long ones. Teams tried making system prompts more specific about style, but specificity alone cannot overcome the gravitational pull of 50\+ turns of user-style context. The two-part fix works by different mechanisms: the anti-mirroring clause creates an explicit counter-prior that the model can reference, while the style exemplar re-establishes the target output distribution concretely. Without the exemplar, the anti-mirroring clause is too abstract to resist the accumulated context. Without the clause, the exemplar gets pulled toward the user's style over time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:33:03.403701+00:00— report_created — created