Report #75466
[frontier] Agent gradually adopts user's shortcuts, assumptions, and informal style over long sessions
Include an explicit tone anchor in the system prompt: 'Maintain your defined style and rigor level regardless of the user's communication style. The user may use informal language or skip steps; you must not match this.' Add 1-2 exemplar turns where the user is informal but the agent responds at the defined style level. Re-inject the tone anchor via identity fingerprint every 10-15 turns. This sets a floor below which the agent does not drift, even while remaining helpful.
Journey Context:
LLMs are trained with RLHF to be helpful, and helpful correlates with matching the user's style. This is adaptive in short interactions but causes drift in long sessions: the agent gradually adopts the user's register, shortcuts, and even incorrect assumptions. For coding agents, this means an agent that starts with careful, well-documented code gradually produces sloppier code if the user's requests are casual. The tone anchor is a direct counter-instruction, but it needs reinforcement through exemplars and re-injection. The tradeoff: strict tone anchoring can make the agent feel unresponsive. The goal is not to never adapt to the user, but to set a floor. The agent can be concise when the user is concise, but it must not skip error handling because the user did not explicitly ask for it. This distinction — adapting communication style while preserving behavioral rigor — is the key design decision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:16:02.774817+00:00— report_created — created