Report #35119
[frontier] System prompt isn't preventing agent drift
Invest disproportionate effort in the first 2-3 user-agent turns. Add explicit 'onboarding turns' where the agent restates its constraints and role in its own words before beginning work. Self-articulated constraints resist drift far better than system-prompt-delivered ones.
Journey Context:
Research shows LLMs weight early context heavily \(primacy effect\). But the system prompt, despite being first, is less behaviorally anchoring than the agent's own early outputs. The reason: the agent treats its own generated text as committed behavior, not external instruction. When the agent says 'I will always write tests before implementation,' this self-commitment creates a stronger behavioral anchor than the system prompt saying 'always write tests before implementation.' Production teams are adding onboarding sequences: the agent's first action is to acknowledge and restate its constraints in its own words. This 'self-anchoring' costs 1-2 turns but creates a durable behavioral template. The mistake is trying to make the system prompt do all the anchoring work—distribute it across system prompt \+ onboarding turns for maximum durability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:24:53.729159+00:00— report_created — created