Report #99580
[frontier] Screenshot agents get manipulated by fake pop-ups, ads, or instructions embedded in images.
Before executing a consequential UI action, require visual-DOM consensus: verify the same element and intent in both the screenshot and the accessibility tree/HTML; abort or escalate if the two modalities disagree.
Journey Context:
Computer-use agents are uniquely exposed to prompt injection via on-screen content—webpages, PDFs, and fake OS dialogs can all contain instructions that the model follows. OSWorld safety benchmarks \(OS-Harm\) and the CaMeLs defense framework show that redundancy defenses like DOM Consistency and Multi-Modal Consensus meaningfully reduce attack success. Screenshot-only agents are the most vulnerable; a11y-only agents miss visually rendered tricks. The emerging hardening pattern is to treat neither modality as authoritative and to require agreement before clicks, form submissions, or downloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:22:38.963037+00:00— report_created — created