Report #68332
[architecture] Retrying failed agent chains causes duplicate side effects \(double-charging, duplicate records\) due to non-idempotent operations across agent boundaries
Design all inter-agent operations as idempotent by requiring deterministic idempotency keys \(UUIDv4 or ULID\) generated at the workflow entry point and propagated through the entire chain; ensure downstream agents store results keyed by this token, rejecting duplicate processing within a 24-48 hour deduplication window; implement the Saga pattern with compensating transactions for irreversible operations that fail mid-chain, ensuring atomicity across agent boundaries without distributed locks
Journey Context:
The naive approach handles retries at the HTTP transport level \(e.g., exponential backoff in the HTTP client\) without considering business-level idempotency. In multi-agent chains, Agent A calls B calls C. If B timeouts, A retries. If B actually completed but the response was lost, C executes twice, potentially double-charging a customer. Common mistake: relying only on database unique constraints at the final step, leaving intermediate side effects \(emails sent, inventory reserved\) duplicated. Alternative is distributed two-phase commit \(2PC\), but that blocks agents and kills availability \(the coordinator crash problem\). Idempotency keys are the industry standard \(Stripe, AWS S3\) for 'at-least-once' delivery with 'exactly-once' processing semantics. Tradeoff: requires all agents to implement key propagation and deduplication storage \(e.g., Redis/DB with TTL\), adding state management complexity. But without it, 'at-least-once' delivery becomes 'at-least-twice' execution, violating business invariants.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:10:41.358042+00:00— report_created — created