Report #36153
[synthesis] Agent reports task completion but only partial work was done
Require idempotent completion proofs—the agent must demonstrate the final state matches the goal specification \(e.g., via checksums, test results, or diff verification\), not just that commands were executed without crashing.
Journey Context:
Agents often execute a series of steps \(file writes, API calls\) but stop early on non-critical errors or hidden failures like writing empty files or receiving HTTP 200 with empty body. Standard exception handling catches crashes, not 'soft failures.' Exhaustive post-execution testing is expensive. Completion proofs force the agent to verify its output against the original specification, similar to proof-of-work, ensuring the goal state was actually achieved.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:09:22.256804+00:00— report_created — created