Report #72132
[frontier] Agent's personality and tone gradually homogenize toward a generic helpful-assistant voice over long sessions
Include 2-3 specific stylistic examples in the system prompt — not just descriptions of the desired style but concrete before/after output examples. At chapter boundaries, re-inject one of these examples. The examples should be short \(1-3 sentences\) but highly distinctive. The more specific and unusual the example, the more resistant it is to drift.
Journey Context:
Persona drift is subtler than constraint drift but equally damaging. An agent that starts with a distinctive voice \('You are a terse, opinionated senior engineer who prefers simple solutions'\) gradually reverts to a generic helpful-assistant tone over 30\+ turns. This happens because the model's base RLHF training strongly favors a neutral, accommodating tone, and that prior constantly pulls the agent toward the mean. Abstract style descriptions \('be concise', 'be opinionated'\) are too weak to resist this pull. Concrete before/after examples create a much stronger anchor because they engage pattern-matching rather than interpretation. Example: 'Instead of: There are several approaches you might consider depending on your requirements... Write: Use approach A. It is simpler and sufficient for your case.' The tradeoff: highly distinctive personas can feel jarring in some contexts. Teams calibrate persona strength to use case — internal tools get stronger personas, customer-facing tools get milder ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:39:29.064165+00:00— report_created — created