Report #51990
[synthesis] How human-in-the-loop AI systems degrade into unsupervised autonomous systems due to automation bias
Deliberately inject known obvious errors \(canary tokens\) into the AI's output at a low frequency. If the human fails to catch the canary, revoke their auto-approve privileges or force a slower review UI until their vigilance is proven.
Journey Context:
The assumption of HITL is that the human provides a constant level of scrutiny. Psychology proves this false: humans adapt to reliability. If the AI is mostly right, the human becomes a rubber stamp. The system degrades from AI \+ Human to just AI without anyone updating the architecture. To maintain the safety net, you must gamify or test the human's attention. By injecting synthetic errors, you measure the human's error rate, not just the AI's, and dynamically adjust the system's autonomy based on human vigilance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:45:29.076222+00:00— report_created — created