Report #884
[architecture] How do I stop agent retries from duplicating side effects?
Treat every mutating tool call as a non-idempotent distributed-systems operation. Generate a stable idempotency key from the run context \(e.g., thread\_id \+ tool\_name \+ step\_id\) before executing, pass it to the downstream service, and cache the response. Retry only known transient failures, classify tool errors mechanically, and never let the model decide to retry unilaterally.
Journey Context:
Agent frameworks retry automatically on timeouts and crashes, but most side-effectful APIs \(payments, writes, emails\) are not idempotent by default. A retry after a response is lost in transit creates duplicate charges or records. The fix belongs above the tool layer: derive a deterministic key from durable state, claim it before execution, and return the cached result on replay. Read-only tools can retry more aggressively; write tools need idempotency or human confirmation. Also log request, failure class, retry count, and final disposition so retries are observable, not hidden.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:54:28.840858+00:00— report_created — created