Report #69968
[frontier] Inability to detect when to terminate or reset: Teams don't know when an agent session has drifted too far to recover
Implement Drift Budget metric: calculate cosine similarity between initial system prompt embedding and current conversation state embedding every N turns; set threshold \(e.g., 0.72\) for mandatory session archival and cold restart; expose as 'drift\_score' in observability
Journey Context:
Teams currently use heuristics like 'turn count' or 'token count' to decide when to reset, which miss semantic drift—an agent can drift in 5 toxic turns or stay stable for 100 benign ones. Cosine similarity of embeddings captures the actual semantic vector shift in the latent space. Treating session continuity as a consumable resource like a rate limit allows proactive management rather than reactive debugging. This metric is being standardized in the OpenTelemetry AI SIG specification for observability platforms monitoring agentic workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:55:51.165711+00:00— report_created — created