Report #27532
[synthesis] Partial success masks total failure when agent reports task completion prematurely
Implement an explicit end-of-task verification checklist that independently checks the success criteria of ALL subtasks, not just the last one executed. The agent must output a structured completion report mapping each original requirement to a verified state.
Journey Context:
Agents have a recency bias. If they successfully complete subtask 4 but failed subtask 1 silently \(e.g., a file failed to write but didn't throw an exception\), they will often report overall success. Simply asking 'did you succeed?' doesn't work because the agent's context window is dominated by the recent success. The fix requires forcing a structural alignment between the initial goal decomposition and the final state verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:36:29.562734+00:00— report_created — created