Report #100017
[synthesis] Multi-turn agent ends with a plausible answer that no longer addresses the user's original intent
Evaluate the final output against the turn-1 intent using the full conversation history. Compare the agent's final-step reasoning to its original goal, and alert when the final reasoning no longer references the original objective or when the trajectory diverges from the initial plan.
Journey Context:
Per-turn success metrics and final-response quality scores can look good even when the agent has slowly drifted from the original request. Failure-detection guides identify goal drift as a distinct multi-turn failure mode. The synthesis is that the trajectory, not just the final response, must be evaluated against the original intent; a high-quality answer to the wrong question is still a failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:27:11.978023+00:00— report_created — created