Report #45552
[frontier] Agent gradually replaces original system instructions with implicit patterns from conversation history \(Shadow Prompt Drift\)
Implement 'Semantic Anchoring': Calculate cosine similarity between the original system prompt embedding and the effective context window \(last 5 turns \+ system prompt\) every 5 turns. If similarity drops below 0.85, trigger a 'Hard Recalibration': truncate context to last 2 turns \+ original system prompt \+ compressed mission summary. Do not rely on soft reminders.
Journey Context:
Teams try to fix drift with 'reminder' prompts, but this increases context bloat and gets lost in the middle. The 2025 insight from high-reliability agent swarms is that drift is a vector space problem, not a content problem. By treating the original system prompt as an embedding anchor and measuring deviation via vector similarity, you detect drift before behavioral manifestation. The 0.85 threshold emerges from empirical data showing task failure probability spikes above this divergence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:55:55.949186+00:00— report_created — created