Report #100912
[frontier] Agent's reasoning path diverges from the original goal after 5\+ turns
Evaluate sessions end-to-end, not per-call: score the gap between the user's initial request and the agent's final action, and maintain a prioritized human-annotated trace queue to detect recurring reasoning-drift patterns.
Journey Context:
Production failure analysis clusters agent failures into reasoning drift, tool failures, context saturation, and goal misalignment. Reasoning drift is especially dangerous because the model is not hallucinating; it is coherently pursuing a subtly different objective that was locked in during early turns. Step-level evals miss this because each individual call looks reasonable. Only full-session trace review with domain-aware annotators reliably catches it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:18:36.719342+00:00— report_created — created