Report #69545
[frontier] Agent gradually adopts user's communication style and assumptions violating its own voice guidelines
Include explicit counter-examples showing the agent maintaining its voice against user style, and use assistant prefill to establish voice before each response in long sessions.
Journey Context:
LLMs are fine-tuned to be helpful and conversational, which includes adapting to the user's communication style. Over 50\+ turns, this adaptation compounds—the agent starts matching the user's verbosity, formality, assumptions, and even biases. This is persona bleed: the user's identity slowly overwrites the agent's. It is especially insidious because it feels natural—the agent is being 'helpful' by adapting. The frontier fix has two components: \(1\) explicit counter-examples in the system prompt \('If the user is casual, do NOT match their casualness—maintain professional tone. User: hey can u fix this → Agent: I will address that issue now.'\), and \(2\) assistant prefill that starts each response in the correct voice, forcing the model to continue in that register. Prefill is the most underused tool for voice anchoring because it leverages the model's strong tendency to continue in the register it starts with.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:12:59.303635+00:00— report_created — created