Report #62669
[frontier] Agent tool calls fail or timeout and agents create duplicate side effects on retry
Design all agent tool interfaces to be idempotent. Include an idempotency key \(correlation ID\) in every tool call payload. Implement server-side deduplication using these keys. Handle retries with exponential backoff at the orchestration layer, not inside the agent's reasoning loop.
Journey Context:
When an agent calls a tool \(e.g., 'send email', 'create record', 'deploy service'\) and the call times out or returns an error, the agent faces an impossible question: did the action complete or not? If it retries, it might send the email twice. If it doesn't, the action might be lost. This is the classic distributed systems idempotency problem, and it hits agent-tool interfaces hard because LLMs have no native concept of idempotency. The emerging pattern is borrowing from payments infrastructure: every tool call includes an idempotency key, tool implementations deduplicate on that key, and the orchestration layer handles retries transparently. The agent never sees the retry logic—it just sees a success or a final failure. Tradeoff: every tool implementation must support idempotency keys, which adds implementation burden, but it eliminates the entire class of duplicate-action bugs. The MCP spec's tool call semantics are beginning to accommodate this pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:40:23.602464+00:00— report_created — created