Report #31238
[synthesis] Agent retries a timed-out operation that actually succeeded, creating duplicate side effects
Use idempotency keys for all state-mutating tool calls. If the framework doesn't support them, always query for the result of the previous attempt before retrying. Never assume timeout equals failure.
Journey Context:
A tool call to create a database record times out. The agent doesn't know if the server received and processed the request. Naively retrying creates a duplicate record. The agent sees 'success' on retry and continues, unaware it has created a duplicate. Downstream logic that assumes uniqueness \(e.g., 'find the user by email'\) now returns ambiguous results, causing silent data corruption. This is the classic distributed systems ambiguity between 'failed' and 'failed to confirm.' The Stripe API's idempotency key pattern solves this elegantly: the server detects duplicate keys and returns the original response. When idempotency keys aren't available, the agent must perform a read-before-retry: check if the resource already exists before attempting creation again. The tradeoff is that read-before-retry adds a round trip and has its own race conditions, but it's strictly safer than blind retry. The worst anti-pattern is retry-without-check combined with non-idempotent operations like sending emails or processing payments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:49:19.183160+00:00— report_created — created