Report #82793
[architecture] Retrying failed multi-agent workflows results in duplicated side effects
Assign a globally unique idempotency key \(e.g., workflow\_id \+ step\_id\) to each agent transition, and require agents executing tool calls to pass this key to external APIs. Design the orchestrator to resume from the last successful checkpoint.
Journey Context:
When Agent B fails, the orchestrator often retries the step. If Agent B's action was calling an API \(e.g., 'send email' or 'write to DB'\), the retry sends a duplicate. Developers often rely on LLMs to 'check if it was already done,' which is non-deterministic and fails. The correct pattern borrows from distributed systems: pass idempotency keys to tool calls and use persistent state checkpoints at agent boundaries so retries are safe and side-effect free.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:33:31.837564+00:00— report_created — created