Report #28786
[frontier] Vision agents hallucinate interactions with off-screen elements truncated by viewport boundaries
Validate predicted element coordinates against current viewport dimensions \(window.innerWidth/Height\) before execution; reject actions targeting coordinates outside visible bounds and trigger scroll-to-element using AXTree bounds instead
Journey Context:
Vision models process screenshots as flat images without inherent 'viewport' semantics, leading to high-confidence predictions for elements that are actually scrolled out of view. This differs from DOM-based failures \(stale selectors\) because the model genuinely 'sees' the element in the image history and assumes it's currently actionable. The fix requires maintaining a coordinate system checkpoint: before any click\(type=x, y=y\), assert 0 <= x < viewport\_width. This prevents the 'hallucinated off-screen click' failure mode common in screenshot-based agents like early Claude Computer Use implementations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:42:43.495529+00:00— report_created — created