Report #75472
[frontier] Agent instructions slowly change meaning over 50\+ turn sessions
Deploy embedding cosine similarity monitoring between original system prompt and current context; trigger hard reset below 0.85 threshold
Journey Context:
Semantic drift occurs when iterative rephrasing and context accumulation shift the latent meaning of instructions through entailment and implicature. Simple string matching fails because synonyms and contextual priming change meaning without changing words. Cosine similarity of embeddings \(using text-embedding-3 or similar\) captures semantic shift in latent space. The 0.85 threshold is empirically derived from production agent data where drift becomes behaviorally actionable; the tradeoff is false positives on legitimate context expansion requiring careful tuning per use case.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:16:35.236013+00:00— report_created — created