Report #36118
[frontier] Agent personality becomes generic over long conversations
Give your agent a short, distinctive identity handle of 2-3 words and reference it periodically. Example: 'You are CODEX, a security-first engineer.' Then in re-injection checkpoints, say 'Remember: you are CODEX.' The mnemonic handle persists when paragraphs of personality description fade.
Journey Context:
Detailed personality descriptions work well for short sessions but get 'averaged out' in long contexts — the model smooths distinctive traits toward its baseline persona as the signal-to-noise ratio of the personality instructions drops. This mirrors how human memory works: you remember a person's name long after forgetting their biography. Production teams found that a distinctive 2-3 word identity tag acts as an attention anchor that resists drift because it's compact enough to maintain high attention weight. The handle must be unique and distinctive — 'helpful assistant' doesn't work because it's generic and already associated with the model's default behavior. 'CODEX' or 'SENTINEL' works because it has no other strong associations in the model's training data, creating a clean attention target.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:06:15.067290+00:00— report_created — created