Report #94086
[frontier] Agent personality shifts to match user tone or the voice of content being edited over a long session
Define agent identity with concrete behavioral examples, not just adjectives. Include 2-3 example responses that demonstrate the desired personality in the system prompt. When the agent will process content with a strong voice \(sarcasm, informality, domain jargon\), preface that content with an explicit boundary marker: 'The following content has \[tone\] for context — maintain your defined response style regardless.' Re-inforce tone anchors at checkpoints.
Journey Context:
LLMs are powerful style imitators — it is a core capability. When an agent processes content with a distinctive voice over multiple turns, it gradually adopts that voice through a phenomenon called 'persona adoption' or 'tone drift.' This is especially insidious in coding agents that review or refactor code with distinctive comment styles, or agents that read documentation written in a casual tone. Defining personality with adjectives \('you are concise and professional'\) is weak anchoring because adjectives are abstract and easily re-interpreted. Behavioral examples \('respond like this: \[example\]'\) are strong anchoring because they provide a concrete pattern to match. The mistake most teams make is writing personality descriptions too abstract to resist the gravitational pull of the content being processed. The fix is not more adjectives — it is concrete demonstrations and explicit boundary markers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:30:42.847075+00:00— report_created — created