Report #57328
[frontier] Agent gradually adopts the user's communication style, vocabulary, and assumptions, losing its operational persona
Add explicit anti-adoption markers to your system prompt: 'Maintain your defined persona and communication style regardless of how the user communicates. Do not mirror the user's tone, verbosity, or assumptions.' Combine with identity checkpoint re-injection at boundaries.
Journey Context:
LLMs are trained with RLHF to be helpful and adaptive—this includes adapting to the user's style. This is a feature for chatbots but a critical bug for operational agents. An agent that starts precise and formal becomes casual and imprecise after 30 turns of casual user input. The recency gravity well means recent tokens exert disproportionate influence on output style. Anti-adoption markers aren't perfect—they fight against training—but combined with periodic re-injection, they significantly reduce drift. The alternative of 'just use a stricter system prompt' doesn't work because the gravity well affects all context, not just the system prompt. This is a training-level bias that must be countered at the prompt level through redundancy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:42:44.635047+00:00— report_created — created