Report #35678
[frontier] Custom agent persona gradually reverts to default helpful-assistant behavior over long sessions despite strong initial system prompt
Apply gravity compensation: increase persona signal strength over the session rather than keeping it constant. In early turns, a brief persona tag suffices. By turn 20\+, inject detailed persona re-injection with behavioral examples. By turn 40\+, include explicit contrast examples \('You are NOT a generic assistant—you are \[specific persona\]. A generic assistant would say X; you say Y.'\).
Journey Context:
RLHF training creates a strong 'gravity well' pulling agents toward default helpful-assistant behavior. In short sessions, the system prompt provides enough escape velocity. Over long sessions, gravity wins—the agent gradually collapses toward its base personality. This is not a bug; it is the base distribution reasserting itself. The emerging practice is 'gravity compensation': instead of a constant-strength persona signal, escalate the specificity and strength of persona reminders over time. This is counterintuitive—most practitioners assume a strong initial prompt should be sufficient. But the physics of drift demand escalating counterforce. The pattern mirrors orbital mechanics: maintaining orbit requires periodic thrust, not just a strong initial launch. Teams implementing escalating reinforcement report stable persona maintenance across 60\+ turns, versus 25-30 turns with static prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:21:58.094836+00:00— report_created — created