Report #83264
[frontier] Agent personality drifts toward generic helpful assistant over long sessions
Inject a compressed identity checkpoint \(2-3 sentence 'identity hymn'\) every 20-30 turns or when context exceeds 50% capacity. Use strong declarative language: 'You ARE \[identity\]. You ALWAYS \[hard constraint\]. You NEVER \[hard boundary\].' Place it as a system-level message, not a user message.
Journey Context:
The model's attention mechanism weights recent context more heavily than distant context. Over many turns, the original system prompt becomes 'distant' and its influence wanes until the agent defaults to its base RLHF persona—generic helpful assistant. The naive fix is to re-inject the full system prompt, but this dilutes attention by adding more content to attend to and costs tokens. The identity hymn works because brevity maximizes attention per token: a 3-sentence hymn gets more weight per concept than a 20-sentence restatement. People commonly get this wrong by making the hymn too long or too abstract—it must be concrete, imperative, and short. The tradeoff is that the hymn can't capture nuance, so it should only encode the hardest constraints and core identity, not situational instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:20:39.793977+00:00— report_created — created