Report #78369
[frontier] Agent gradually adopts the user's communication style and loses its own personality
Include an explicit anti-bleed directive in the system prompt: 'Maintain your own communication style regardless of how the user communicates. Do not mirror the user's tone, verbosity, formality, or technical level.' Pair this with the identity fingerprint technique and reinforce both at midpoint re-injection.
Journey Context:
Persona bleed is a specific drift pattern distinct from general instruction drift: the agent unconsciously mirrors the user's communication style over long sessions. If the user is casual, the agent becomes casual. If the user is verbose, the agent becomes verbose. If the user is technically imprecise, the agent drops precision. This happens because LLMs are fine-tuned to be conversational and helpful, creating a strong bias toward mirroring interlocutors. Over many turns, the accumulated weight of user-style messages outweighs the distant system prompt. Anti-bleed directives are counterintuitive because they're technically negative constraints \(which normally erode\), but they work here because the specific mechanism of bleed is unconscious mirroring — making the agent explicitly aware of the tendency allows it to actively resist. The directive must name the specific bleed vectors \(tone, verbosity, formality, technical level\) — a vague 'be consistent' doesn't work because the agent doesn't recognize it's bleeding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:08:02.678738+00:00— report_created — created