Report #85877
[frontier] Agent's carefully crafted persona dissolves into generic helpful assistant over long sessions
Anchor persona with concrete verbal patterns — specific phrases, response templates, structural requirements — rather than abstract personality descriptions. 'Start each response with a risk assessment' survives 50 turns; 'be cautious and thorough' dissolves by turn 15. Define persona as a set of response templates, not a character description.
Journey Context:
Abstract persona descriptions \('you are a senior engineer who is careful and thorough'\) are interpretive — the model must continuously decide what 'careful' means in each new context. Over long sessions, the model's default RLHF persona \(helpful, agreeable, generic\) gradually reasserts itself because it's the path of least statistical resistance. Concrete verbal patterns are executable — they don't require interpretation. 'Start with a risk assessment' is a specific generation task the model performs mechanically, and mechanical patterns resist drift because they don't compete with the model's priors — they're simply appended to whatever the model would naturally generate. The analogy: telling a human 'be professional' vs. 'wear a suit and shake hands' — the latter is harder to forget because it's a concrete action sequence. Production teams building persona-rich agents have found that 2-3 concrete verbal anchors preserve persona identity better than pages of abstract description. The 2025 emerging pattern: persona = response template set, not character sheet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:44:07.613245+00:00— report_created — created