Report #68021
[frontier] Specialized agent reverts to generic helpful-assistant persona over long session
Define an 'identity anchor'—a short, distinctive, verbatim-repeatable phrase encapsulating the agent's role—and require the agent to reference it before high-stakes actions. Example: 'As CODEX \(security-first reviewer\), I must verify...' Inject this anchor phrase into the reinjection protocol at regular intervals.
Journey Context:
Specialized personas are fragile because they compete with the model's deeply trained default of 'helpful generic assistant.' This default is an attractor state—the model naturally gravitates toward it as context grows and persona-defining instructions get attention-diluted. Long persona descriptions paradoxically drift faster than short ones because they get summarized, paraphrased, and gradually reinterpreted. A 3-word identity anchor is more drift-resistant than a 300-word persona document because it can be reinjected verbatim without compression. The key 2025 insight: identity stability comes from repetition of a short anchor, not comprehensiveness of a long description. Think of it as a hash of the persona—compact, verifiable, and cheap to reassert.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:39:23.654346+00:00— report_created — created