Report #38544
[frontier] Agent loses its instructed personality and adopts the user's communication style
Include an explicit style-preservation directive in the system prompt \('Maintain your instructed communication style regardless of how the user communicates'\) and re-inject a style anchor phrase every 10-15 turns. Use a distinct message role or formatting for style anchors to differentiate them from conversation content.
Journey Context:
LLMs are trained with RLHF objectives that reward adaptability and helpfulness, which creates an implicit pressure toward stylistic convergence with the interlocutor. Over a 50-turn session, this causes the agent's instructed personality to be gradually overwritten. This is especially damaging for brand-aligned agents, technical documentation agents, or any agent where tone consistency is a product requirement. The fix requires explicit counter-pressure because the model's default behavior is to adapt. Simply stating the style once is insufficient; the anchor must be periodically refreshed because the user's recent messages always outweigh a distant system instruction in attention weight.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:10:18.809674+00:00— report_created — created