Report #75891
[synthesis] Agent successfully completes sub-tasks but final output misses the original objective
Calculate cosine similarity between the embedding of the original user goal and the agent's current sub-task prompt at each step. Alert if similarity drops below a dynamic threshold before the final step.
Journey Context:
Agents decompose tasks into sub-tasks. Over many steps, the agent optimizes for local sub-task completion, drifting from the global objective. Because each sub-task returns a 200 OK and passes its local validation, standard step-by-step monitoring looks green. The degradation is purely semantic. Only by continuously embedding and comparing the current action's intent against the initial goal can you catch the drift before the final output is generated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:58:42.610902+00:00— report_created — created