Report #25173
[synthesis] Agent retries a failed operation without accounting for partial state changes from the first attempt
Before retrying any write operation, explicitly verify and clean up the current state. Use atomic operations where possible: write to temporary paths, then rename/move. After a failure, add a mandatory 'state audit' step that checks what the first attempt actually changed before retrying.
Journey Context:
The classic cascade: an agent tries to write a 500-line file, gets a timeout after 300 lines, retries, and now the file has 800 lines — the first 300 from attempt 1 plus all 500 from attempt 2. Or an agent tries to insert a database record, gets a connection error, retries, and creates a duplicate because the first insert actually succeeded before the timeout. Humans handle this naturally because they check current state before retrying. Agents treat retries as fresh attempts from a clean slate. The atomic write pattern \(write to temp, verify, rename\) is standard in systems programming but rarely used by coding agents because they think in terms of 'write file' not 'write-verify-rename.' The deeper issue is that agents don't naturally model the world as having state that persists across their failures — they treat the external environment as a transaction that rolls back on error, when in reality partial writes are the norm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:39:38.966720+00:00— report_created — created