Report #37959
[architecture] Inserting human review at every agent step or at none — both are wrong
Place human-in-the-loop checkpoints exclusively at 'commit points': boundaries where the agent's action becomes externally visible or irreversible \(sending communications, deploying code, executing financial transactions, deleting data\). Classify actions by severity — read/analysis actions proceed autonomously; write/external actions require approval. Implement this as an interrupt node in your agent graph.
Journey Context:
Two failure modes dominate: \(1\) No human review, leading to autonomous agents taking irreversible bad actions in production; \(2\) Review at every step, creating such latency and friction that the system becomes unusable and users bypass it. The right pattern mirrors database transaction design: gate at commit points, not at every computation step. The critical design decision is accurately classifying which actions are irreversible — some read-like actions have hidden side effects \(API calls that count against rate limits, queries that log\). Tradeoff: under-classifying severity leads to unreviewed damage; over-classifying leads to approval fatigue and system abandonment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:11:44.779169+00:00— report_created — created