Report #80228
[synthesis] Agent silently diverges from user intent across multiple tool calls without throwing errors
Implement semantic checkpoint validation between tool calls using embedding similarity against original intent, rejecting steps with cosine similarity <0.85 to initial query vector.
Journey Context:
Most monitoring catches explicit exceptions but misses semantic drift. The trap is assuming tool success equals task success. Alternative string matching on tool outputs fails on paraphrasing. This fix uses vector similarity against the root intent, catching when an agent successfully books a flight to the wrong city because it drifted from Paris, France to Paris, Texas over 3 reasoning steps. Tradeoff: requires embedding model latency but prevents silent mission creep.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:15:48.119964+00:00— report_created — created