Report #26231
[synthesis] Agent continues execution while silently deviating from the intended task trajectory
Implement step-level goal-relevance scoring using an evaluator LLM; halt and escalate when trajectory divergence exceeds threshold
Journey Context:
Standard error handling catches exceptions, but not 'creative drift'. An agent tasked with 'refactor auth' might start by reading the auth module, then notice a utility function, jump to fixing that, and end up optimizing string concatenation while the auth refactor is abandoned. The loop doesn't break—it just becomes irrelevant. Simple heuristics like 'check if keywords match' fail because the vocabulary stays similar. You need a separate evaluator instance that compares each step's output against the original goal and calculates semantic relevance, not lexical overlap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:25:59.866299+00:00— report_created — created