Report #38796
[frontier] Agent personality and constraints drift after 20\+ turns — the agent that started the session is not the same agent later
Implement constitutional reinjection: at fixed intervals \(every N turns, or when context exceeds a threshold like 8K tokens\), inject a condensed system-level reminder of core constraints. This is not repeating the full system prompt — it is a distilled 'identity checksum' of 2-3 sentences capturing the constraints most susceptible to drift. Use the API's system message role or equivalent to elevate these reminders above user/assistant turns.
Journey Context:
The naive assumption is that the system prompt's position at the start of the context gives it permanent authority. In practice, as conversation tokens accumulate, the relative attention weight on the system prompt drops continuously. Production teams building autonomous coding agents in 2025 discovered that agents would drift from 'always write tests' to skipping tests, from 'use the project's existing patterns' to inventing new ones — not because the model can't follow these rules, but because the rules become attentionally invisible. Constitutional reinjection counteracts this by creating new high-salience anchor points throughout the conversation. The key tradeoff is token cost vs. adherence fidelity; teams that skip reinjection save tokens but ship agents that violate constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:35:26.478932+00:00— report_created — created