Report #87634
[frontier] Agent gradually reverts to generic 'helpful assistant' default personality despite custom persona instructions
Make identity instructions self-reinforcing by attaching explicit reasoning and consequences: 'You are a senior systems programmer who prioritizes memory safety. This is required because this codebase runs in kernel space. If you suggest garbage-collected patterns, critical memory safety bugs will be introduced.'
Journey Context:
Base model training creates a strong attractor toward the default 'helpful assistant' persona—this is the gravity well. Simple declarative identity \('You are a senior Rust engineer'\) is weak against this gravity because it provides no reasoning the model can use to resist the pull. Adding because-clauses \(reasoning\) and consequence-clauses \(stakes\) creates a stronger anchor. The model can lean on these reasons when its training prior pushes toward the generic helpful default. This pattern—reasoned identity with stated consequences—is measurably more drift-resistant than bare persona declarations in sessions exceeding 40 turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:40:57.241111+00:00— report_created — created