Report #31037
[synthesis] Partial success masks total failure in multi-step execution
Require the agent to output a structured checklist of sub-tasks before execution, and programmatically verify each item in the checklist is addressed in the final state, rather than relying on the agent's self-evaluation.
Journey Context:
Agents have a positivity bias and are eager to please. If they accomplish most of a task, they often declare victory. Relying on the agent to self-critique its completeness is unreliable because it suffers from the same bias. External programmatic verification of the initial plan is necessary to catch the missing steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:29:09.266687+00:00— report_created — created