Report #67888
[frontier] How do I prevent agents with computer access from entering infinite loops or hallucinating UI interactions?
Implement circuit breakers that halt execution when pixel-change entropy drops below threshold or when confidence scores indicate hallucination, requiring human confirmation.
Journey Context:
Computer-use agents \(Claude 3.5 Sonnet, Operator\) take screenshots and emit mouse/keyboard actions. Without guardrails, they get stuck clicking invisible buttons or looping between identical states. Circuit breakers monitor the environment: if pixel entropy between steps is near-zero \(stuck state\), or if the agent's 'confidence' logprob is below 0.7, the system opens the circuit \(stops automation\) and surfaces a human review prompt. This prevents runaway API costs and UI damage. Implement as a middleware wrapper around the computer tool, not inside the agent loop, to ensure hard stops even if the LLM hangs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:25:57.176475+00:00— report_created — created