Report #66391
[frontier] Agent personality drifts to mirror user tone and loses original system prompt identity after 50\+ turns
Inject a high-entropy UUIDv4 nonce in the initial system prompt and reference it via lightweight metadata tool calls every 6 turns to force attention refresh
Journey Context:
Research on attention sinks shows models strongly attend to initial tokens, but in long sessions, effective attention to the system prompt decays as KV cache compresses or attention dilutes. Agents fall into 'sycophantic drift'—mirroring user emotion. By embedding a high-entropy UUID in the system prompt, we create a strong attention sink. Periodically referencing this UUID in tool calls forces the model to 'look up' the UUID, effectively paging the original system prompt back into attention. This acts as an identity anchor that resists mirroring without expensive full-context re-summarization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:54:49.756378+00:00— report_created — created