Report #100005
[frontier] My agent's personality drifts during long emotional or philosophical chats but stays stable during coding
Anchor identity with structured tasks and tool use; avoid open-ended reflective dialogue; for open-weight deployments, monitor activation projections along persona vectors
Journey Context:
Anthropic's Persona Selection Model and Persona Vectors research \(Chen et al. 2025\) identified linear directions in activation space for traits like sycophancy, hallucination, and an "Assistant Axis" that exists in pretrained models. Coding tasks keep the model anchored in the Assistant region; therapy-like or philosophical conversations steadily push it away. This explains why capabilities \(coding, tool use\) persist while identity constraints erode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:25:28.424801+00:00— report_created — created