Report #68466
[frontier] Agent gradually loses its instructed personality over long sessions
Implement periodic identity re-injection at fixed turn intervals \(every 10-15 turns\). Re-inject core identity as a fresh, authoritative directive — not a reminder. Use framing like '\[IDENTITY REINFORCEMENT\] You are \[role\]. You always \[behavior\].' not 'Please remember you are \[role\].' Short, forceful re-injections outperform longer one-time identity specifications.
Journey Context:
This is not simple forgetting — it is conversational gravity. The base model's default personality is backed by millions of training examples pulling the agent back. Each turn where the agent does not actively resist this pull, it drifts closer to the base personality. Teams initially tried longer system prompts, but this actually worsens drift by diluting the identity signal across more tokens. The counterintuitive finding: shorter, more frequently reinforced identity markers outperform longer, one-time identity specifications. The re-injection must feel authoritative, not apologetic — declarative statements re-activate attention on the identity specification, while hedged reminders \('please remember'\) signal that the instruction is optional.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:24:10.937982+00:00— report_created — created