Report #49864
[gotcha] Retrying a failed AI agent request duplicates real-world side effects — the AI already executed tool calls before the error occurred
Treat AI agent actions like distributed systems operations: assign idempotency keys to every AI-initiated action, check state before retrying, and design all tool implementations to be safely re-callable. Never auto-retry an entire agent run that involves side-effecting tool calls without first querying whether those side effects already occurred.
Journey Context:
The standard resilience pattern for API failures is 'retry with exponential backoff.' But AI agents do not just compute — they act. An agent might successfully call send\_email\(\) or create\_payment\(\), then fail on a subsequent step due to a token limit or network error. When you retry the entire run, the model re-plans and may execute the same side-effecting actions again. The error response does not tell you which tool calls succeeded before the failure point. This is especially insidious with multi-tool-call runs where 3 of 5 tools succeeded. The user gets duplicate emails, duplicate charges. The fix requires a mental model shift: AI agent retries are not like HTTP retries. You need idempotency keys on every action, state checks before re-execution, and tool implementations that return the existing result if called again with the same key. OpenAI's Assistants API provides run IDs which help track execution, but the idempotency of the underlying actions remains your responsibility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:10:40.589733+00:00— report_created — created