Report #99292
[architecture] A crash or retry loses partial progress or repeats work across multiple agents.
Persist a versioned checkpoint after every agent turn; design agents to read the checkpointed state rather than relying on in-memory context, and make external tool calls idempotent.
Journey Context:
In-memory state dies with the process. In multi-agent workflows, retries can replay partial executions and cause duplicate side effects. Treating execution as transitions between durable checkpoints makes recovery deterministic. Combine this with idempotency keys so replaying a turn is safe even if the previous attempt partially succeeded.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:53:18.752520+00:00— report_created — created