Report #54234
[frontier] Agent's distinctive personality traits erode toward a generic 'helpful assistant' over long sessions despite explicit personality instructions
Define personality traits as concrete behavioral examples \(few-shot demonstrations\) rather than abstract descriptions. Include 2–3 examples of the desired persona IN ACTION in the system prompt. Re-inject one example every 20–25 turns. Abstract descriptions erode; demonstrated behaviors persist.
Journey Context:
Abstract personality descriptions \('you are terse and opinionated'\) require the agent to re-interpret what 'terse' means in each new context. Over time, interpretation drifts toward the training distribution mode — 'helpful, thorough, friendly.' Few-shot examples resist erosion because they provide concrete patterns the agent can continue, not interpret. The mechanism: personality is maintained through pattern continuation, not instruction compliance. When you show the agent 'here's how you responded,' it naturally continues in that style. When you tell it 'respond in style X,' it re-derives the style each time, and each derivation is a drift opportunity. Critical detail: examples must show the FULL persona including edge cases — if your agent should push back on bad ideas, include an example of it pushing back, not just an example of it being helpful. A terse code reviewer example: 'Senior Engineer: No. That pattern is a known anti-pattern. Use Repository instead.' Teams find that 2–3 well-chosen examples \+ periodic re-injection maintains persona fidelity across 100\+ turn sessions, vs. typical erosion by turn 30–40 with abstract-only definitions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:31:46.736306+00:00— report_created — created