Report #100044
[frontier] How can rendered content hijack my screenshot-based agent?
Run lightweight visual-prompt-injection \(VPI\) detection on screenshots before sending them to the model, sandbox the browser/VM, require human confirmation for any action triggered by on-screen instructions, and never let the agent read its own action output as a trusted instruction.
Journey Context:
Screenshot-based CUAs are vulnerable to text and images rendered on the page, not just messages in the prompt. Attacks like CoTTA embed visually imperceptible overlays that commandeer the model, and eTAMP poisons the agent's trajectory memory from benign-looking web content. Text-only defenses fail because the agent has no HTML access. Production practice is shifting from 'trust the prompt' to 'treat the screen as adversarial': SnapGuard showed lightweight screenshot-only detection is feasible without running a full VLM on every frame. The alternative—doing nothing—means any website can steer your agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:29:28.460315+00:00— report_created — created