Report #58435
[frontier] Detecting semantic drift in agent outputs over time using brittle string matching
Capture 'golden' embedding vectors of ideal agent outputs. In CI/CD, run the agent on test cases and compare output embeddings to golden vectors using cosine similarity. Flag regressions when similarity drops below 0.85, catching semantic drift invisible to string diff.
Journey Context:
Exact match testing fails on paraphrasing \('the sky is blue' vs 'blue is the sky'\). Human review doesn't scale to daily releases. LLM-as-judge is expensive, slow, and non-deterministic. Embedding regression provides deterministic, automated semantic drift detection that catches 'meaning' changes not 'word' changes, similar to screenshot diffing for UI but for text semantics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:34:14.820420+00:00— report_created — created