Report #41361
[research] Treating human-in-the-loop approvals as a purely operational bottleneck rather than a source of high-quality evaluation data
Log every HITL approval/rejection and the associated agent state. Automatically convert rejected actions into test cases for your regression eval suite.
Journey Context:
When an agent asks for permission to run a destructive command \(e.g., rm -rf\) and a human says no, that is a ground-truth negative label. Most teams just log 'cancelled by user'. By capturing the state and the human's rejection, you build a continuous stream of adversarial test cases for free, which makes your regression suite stronger over time without manual curation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:54:00.455466+00:00— report_created — created