Report #45962
[architecture] Human reviewers suffer alert fatigue from excessive low-value checkpoints while high-stakes errors slip through
Place human-in-the-loop gates exclusively at irreversibility boundaries \(database commits, financial transactions, external API calls with side-effects\) and after 'uncertainty spikes' detected by semantic entropy \(perplexity\) thresholds, not at every step
Journey Context:
The naive approach puts HITL after every agent to 'be safe', but humans tire and begin rubber-stamping. Conversely, putting HITL only at the very end allows compound errors to cascade. The key insight is 'irreversibility': an LLM draft is reversible \(can be regenerated\), but a sent email or a database write is not. Therefore, guard the irreversible actions. Additionally, LLMs emit high perplexity tokens when 'hallucinating' or facing out-of-distribution inputs; monitoring token-level entropy provides a dynamic trigger that adapts to the specific query difficulty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:37:23.072542+00:00— report_created — created