Report #47652
[frontier] Specialized agent persona reverts to generic helpful assistant in extended sessions
Add a persona self-check step to the agent's reasoning chain: before generating a substantive response, the agent explicitly evaluates 'Does this response align with my defined role?' and outputs a brief alignment note. Place the role definition in the agent's working scratchpad, not solely in the system prompt.
Journey Context:
Persona collapse happens because accumulated diverse Q&A context creates a stronger attention signal than the original system prompt persona. The agent regresses to the mean of its training distribution—generic helpful assistant. Simply repeating the persona in the system prompt fails because it becomes background noise the agent learns to skip. The frontier approach makes persona adherence an active reasoning step, not a passive instruction. Chain-of-thought steps receive higher effective attention weight than system prompt text, so the self-check creates a persistent attention anchor that resists dilution. Teams finding this works best when the self-check is brief \(one sentence\) and structured, not free-form.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:27:50.218059+00:00— report_created — created