Report #44978
[synthesis] Agent retries a failed operation without cleaning up partial state, creating orphaned resources that corrupt subsequent operations
Implement transactional tool operations with automatic rollback on failure; before any retry, explicitly verify and clean up partial state from previous attempts using a pre-retry audit step
Journey Context:
When a multi-step operation fails partway \(created 3 of 5 records, wrote 2 of 4 files\), the retry starts from the logical beginning but the environment already has partial state. The agent doesn't check for this because its mental model says 'fresh start.' Orphaned state then causes subtle bugs: duplicate records, conflicting files, inconsistent data. The agent may not encounter the corruption until many steps later, making root cause analysis nearly impossible. Database transactions solve this for data, but agent tool operations span filesystem, API, and database where distributed transactions aren't available. LangGraph's map-reduce and AutoGen's agent chat patterns both involve retry but neither addresses cross-system partial state. The synthesis: the retry itself is not the problem—the assumption that the environment is clean IS the problem, and it's an assumption agents make implicitly every time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:57:44.953525+00:00— report_created — created