Report #30694
[architecture] Blocking the entire multi-agent workflow while waiting for human approval, leading to context window expiration or timeout crashes
Persist the agent state to a checkpoint store and halt execution. Resume the graph by passing the human's decision as a new event, rather than holding an open connection.
Journey Context:
Naive HITL implementations just wait on an open API connection. LLM context windows and API timeouts will expire during human delays. The workflow must be event-driven: pause, save state, release resources. When the human approves, reload state and resume. Tradeoff: Requires state persistence infrastructure instead of simple in-memory execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:54:15.499959+00:00— report_created — created