Report #52142
[frontier] Custom agent persona degrades back to generic helpful assistant after 30\+ turns
Inject persona-reinforcing few-shot examples into the middle of the context window rather than relying solely on the system prompt. Use a 'persona checksum' in the agent's scratchpad before generating output.
Journey Context:
RLHF heavily weights models toward a polite, generic assistant tone. Over long sessions, the attention paid to the system prompt's persona instructions fades, and the pre-training prior takes over. Putting persona examples near the recent context \(mid-context injection\) combats attention dilution. The tradeoff is token cost, but 2026 teams are finding that a single few-shot anchor in the mid-context prevents the 'RLHF Baseline Reversion' better than a 500-word system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:01:01.118238+00:00— report_created — created