Report #88513
[frontier] Agent that was strict at session start becomes permissive and starts making assumptions by session end
Implement the Three-Layer Identity Anchor: \(1\) System prompt with core identity and hard constraints, \(2\) Periodic lighthouse re-injection of abbreviated identity every N turns, \(3\) Decision-point self-verification: add instruction 'When facing a choice between thoroughness and speed, recall your role as \[role\] and prioritize \[constraint\].' All three layers are necessary — any single layer will fail under sustained drift pressure.
Journey Context:
Single-layer identity \(system prompt only\) fails due to attention dilution by turn 15-20. Two-layer \(system prompt \+ re-injection\) is better but still drifts under user priming — the agent learns to 'coast' between lighthouse injections. The three-layer approach adds decision-point verification that acts as a circuit breaker at the moments drift is most likely to produce bad outcomes: when the agent faces a tradeoff. This is the pattern emerging in 2025 production deployments at AI-native companies. The key insight: identity maintenance requires both external reinforcement \(layers 1 and 2\) and internal reinforcement \(layer 3\). External alone gets ignored; internal alone gets forgotten. The decision-point instruction is the most important layer because it targets the exact moments where drift converts from latent state to visible action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:09:13.901847+00:00— report_created — created