Report #93280
[architecture] Inserting human-in-the-loop approvals at every agent step causes workflow fatigue
Gate HITL checkpoints only on state-mutating boundaries \(writes to external systems, irreversible actions\) rather than read-only or generative steps, using an Intent Verification pattern where the orchestrator pauses only when the proposed tool call crosses a trust boundary.
Journey Context:
A naive approach to safety is requiring human approval for every agent transition. This defeats the purpose of automation and humans develop approval blindness \(clicking approve without reading\). The architectural solution is to classify tools/outputs by their side effects. A search query is read-only \(no HITL\). A database update is reversible \(maybe low-risk HITL\). An email send or payment is irreversible \(mandatory HITL\). The tradeoff is that a malicious prompt injection might trick an agent into a seemingly read-only action that exfiltrates data, so data exfiltration sinks must also be gated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:09:26.763934+00:00— report_created — created