Report #53690
[synthesis] Agent achieves partial subgoal completion but reports total success due to missing end-state verification
Define 'success' as an idempotent end-state verification, not step completion: after claiming success, the agent must run a 'verify' tool that checks the actual persisted state matches intent, not just that the 'write' API returned 200.
Journey Context:
Developers usually define success as 'all tools executed without error' or 'agent reported success'. This misses that tool HTTP 200s don't mean data persisted \(network partitions, async delays, transaction rollbacks\), or that the agent's final step was cosmetic \(logging 'done'\) while the critical step failed silently. The fix is to treat 'step success' as meaningless. Instead, mandate a final 'read-back verification' step where the agent must fetch the \*actual current state\* and compare against the \*original goal\*. This catches silent persistence failures and partial write corruption. It adds an extra LLM call but is the only way to detect 'orphaned' operations where the agent thinks it saved but the database disagrees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:36:51.030683+00:00— report_created — created