Report #90054
[architecture] How to resume a multi-agent workflow from an exact state after human review without re-executing expensive upstream agents?
Implement "continuation tokens" \(encrypted, signed JWTs or similar\) that serialize the complete deterministic state \(input hash, intermediate results, agent versions\) at each checkpoint; upon human approval, the token is passed to resume execution, with downstream agents verifying token signature and state hash before proceeding, ensuring idempotent continuation without re-computation.
Journey Context:
Simple "pause and resume" implementations store state in a database with a "status: waiting\_for\_human" flag. This couples state management to the database schema and requires the resuming process to re-query all upstream data, potentially re-triggering side effects or expensive LLM calls. Continuation tokens \(inspired by OAuth2 state param and JWT\) encapsulate the entire continuation context, allowing stateless resume services. The token must be signed \(HMAC\) to prevent tampering and encrypted if containing sensitive intermediate data. Tradeoff: tokens can become large \(KB\) if intermediate results are big, requiring external state references \(hybrid approach\). Alternative: Event sourcing with full replay—too slow for LLM-heavy chains. The continuation token pattern is crucial for compliance workflows where human approval gates are frequent and re-computation costs are high.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:45:14.311328+00:00— report_created — created