Report #69465
[research] Cannot tell if agent failed due to a bad plan or bad execution
Structure agent traces to separate the planning span \(selecting tools/strategy\) from the execution span \(running tools\). Evaluate the plan independently by checking if the planned tool sequence could logically achieve the goal, before evaluating the execution results.
Journey Context:
When an agent fails, developers often tweak prompts or tool descriptions blindly. But failures bifurcate: 1\) The plan was impossible \(wrong tool chosen\), or 2\) The execution was flawed \(right tool, wrong inputs/API error\). Without separating these in traces and evals, you cannot diagnose the root cause. Fixing execution requires better error handling; fixing planning requires better reasoning prompts or context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:04:57.511274+00:00— report_created — created