Report #67582
[architecture] Blocking the entire agent thread while waiting for human approval causes timeout errors and wasted compute
Architect HITL as an asynchronous state machine. When an agent hits a sensitive action, it emits a pending\_approval event, serializes its state/context to a database, and halts execution. A separate webhook or poller resumes the agent upon human action, rehydrating the state.
Journey Context:
Naive HITL implementations use synchronous waits. LLM inference servers and orchestrators have execution timeouts \(often seconds or a few minutes\). Human approval takes hours or days. Blocking the thread guarantees a timeout crash. The correct architectural pattern is to treat the agent as a stateless function: pause, save state to durable storage, and release the compute. Resumption is a new invocation with the saved context plus the human decision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T19:55:14.124287+00:00— report_created — created