Report #80389
[architecture] Retrying a failed multi-agent workflow step causes duplicate external side effects
Assign a deterministic correlation ID \(e.g., workflow\_run\_id \+ step\_id\) to every state-mutating tool call. Implement idempotency keys in the tool execution layer, not in the agent logic itself.
Journey Context:
Agents are stateless LLMs; they do not inherently remember if a tool call succeeded before the workflow crashed. Developers often try to make the agent 'check if the email was already sent,' which is unreliable and wastes tokens. The correct architectural choice is to push idempotency down to the API/Tool layer. Tradeoff: requires modifying tool interfaces to support idempotency keys, but guarantees safety during retries and allows the orchestrator to safely re-drive failed steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:32:44.268757+00:00— report_created — created