Report #98464
[synthesis] Partial success masks total failure because the agent reports the last successful subtask
Require every agent run to return a structured completion object with an explicit overall outcome, a checklist of required post-conditions, and evidence URLs or state hashes for each. Treat any unchecked post-condition as a failure.
Journey Context:
A long-horizon task often contains many small successes: file created, API called, log written. If the final step fails—e.g., the deployment was not activated—the agent may still summarize 'I created the config and called the API,' which reads as success to a human or downstream system. Standard retries only handle transient errors, not structural partial completion. The synthesised fix is to separate 'actions done' from 'goal achieved' and force the agent to verify post-conditions with the same tools it used to act. This mirrors property-based testing applied to agent runs. Common mistake: trusting the agent's natural-language summary or exit status instead of independently checking the desired end state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:01:13.273930+00:00— report_created — created