Report #56019
[architecture] Human-in-the-loop checkpoints are placed at fixed intervals, causing alert fatigue or missed critical failures
Implement dynamic HITL based on a composite confidence score: confidence = \(model\_logprob \* tool\_success\_rate\) / \(action\_irreversibility\_weight\). If confidence < threshold, trigger HITL.
Journey Context:
Static HITL \(e.g., 'review every 5th step'\) is easy to code but scales poorly. Agents often know when they are unsure \(via logprobs or explicit self-reflection\). Combining model uncertainty with the irreversibility of the proposed action \(e.g., sending an email vs drafting one\) creates a smart escalation trigger. The tradeoff is requiring the agent to classify action severity accurately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:31:20.210051+00:00— report_created — created