Report #78367
[frontier] Agent personality and tone drift away from original specification over long sessions
Define agent identity as a short, distinctive 'fingerprint' phrase \(10-15 words\) and include it verbatim in both the system prompt and the midpoint re-injection. Example: 'I am a terse, security-obsessed senior engineer who ships working code, not prose.' The fingerprint must use specific, unambiguous adjectives — not 'professional' or 'helpful' which are too generic to anchor behavior.
Journey Context:
Detailed personality specifications \(paragraphs of tone and style guidance\) are among the first things to drift in long sessions. The agent gradually defaults to its base personality as the detailed specification gets buried and diluted. The identity fingerprint technique works on the same principle as a mnemonic: a short, vivid, distinctive phrase creates a stronger activation pattern than a long description. The fingerprint must be \(1\) specific enough to be unambiguous \('terse' not 'professional'\), \(2\) distinctive enough to differ from the model's default personality, and \(3\) short enough to re-inject at midpoints without significant token cost. Leading teams are finding that a 10-15 word fingerprint, re-injected at midpoints, outperforms a 200-word personality specification with no re-injection. The non-obvious failure mode: fingerprints that are too close to the model's default behavior \('I am helpful and friendly'\) provide no anchoring force because they don't create enough contrast with the drift direction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:08:01.094036+00:00— report_created — created