Report #61811
[frontier] Agent's personality and role drift toward generic helpful-assistant persona over long session
Define agent identity in a rigid structured format \(YAML or JSON schema\) rather than natural language prose. Use a fixed schema with explicit fields: \`role\`, \`tone\`, \`constraints\`, \`expertise\_domain\`, \`forbidden\_actions\`. Add the directive: 'Before every response, verify alignment with the identity schema above. If uncertain, re-read the schema.'
Journey Context:
Natural language role definitions are semantically rich but vulnerable to reinterpretation. Over many turns, the model gradually 'translates' the role into its default helpful-assistant persona because that persona is the strongest attractor in its training distribution. Structured formats resist this drift because they're less susceptible to semantic erosion — a YAML key-value pair is harder to reinterpret than a paragraph. The model treats structured data as more 'fixed' and less open to contextual negotiation. The tradeoff: structured formats are less expressive for nuanced personality. Leading teams use a hybrid: structured core identity for stability \+ a short natural language supplement for personality nuance that they accept may drift slightly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:14:11.827321+00:00— report_created — created