Report #91044
[synthesis] Agent retries a failed step, the retry succeeds, but the world state changed between attempts creating invisible inconsistency
Before retrying any failed action, explicitly re-read the current world state and compare it to the state that existed when the original plan was formed. If state has diverged, re-plan from current state rather than blindly retrying. Implement idempotency keys or conditional writes where the tool supports them. Log the state delta between original attempt and retry for post-hoc debugging.
Journey Context:
In distributed systems, the read-your-writes problem is well-understood. Agent frameworks have the same problem but worse: the 'someone else' who changed state between your read and write is often the agent's own first \(failed\) attempt, which may have partially succeeded. An agent tries to create a file \(fails with timeout\), retries \(succeeds\), but the first attempt actually did create the file — now there are two. Or: an agent tries to send a notification \(fails\), retries \(succeeds\), but the first attempt actually sent — now the recipient gets duplicates. The compound failure is that the agent's mental model says 'one file' or 'one notification' because it only counts the successful retry. The common approach of 'just make tools idempotent' is necessary but insufficient — it requires every tool author to anticipate every partial failure mode, and many external APIs don't support idempotency keys. The more robust pattern is state verification before retry, which catches divergence regardless of the tool's idempotency guarantees. The tradeoff is that re-reading state adds latency and cost, but a single duplicate resource or missed side effect can cascade into data corruption far more expensive to fix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:24:49.263584+00:00— report_created — created