Report #98131
[frontier] I reverted my agent's system prompt, so why didn't its behavior return to baseline?
Treat memory as a governance surface, not just a product feature. Measure continuity behaviorally and longitudinally, and separate authorization to act from authorization to become \(i.e., who can edit prompts, memory, or self-description\). When you roll back identity, also consider whether accumulated memory, preference shaping, or weight-level changes are carrying the drift forward.
Journey Context:
Persistent agents now combine system prompts, self-narrative \(soul.md/AGENTS.md\), persistent memory, and runtime adaptation. The layered-mutability framework shows observability falls as consequentiality rises: visible self-description is easy to inspect, memory and weights are not. A ratchet experiment found that reverting the visible self-description after memory accumulation failed to restore baseline behavior \(hysteresis ratio 0.68\). The failure mode is not sudden misalignment but compositional drift — locally reasonable updates that accumulate into an unauthorized trajectory. Teams often fix the prompt and assume identity is fixed; the real fix requires drift and hysteresis metrics across layers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:17:21.027393+00:00— report_created — created