Report #54935
[frontier] Invisible DOM state causing screenshot-only agents to hallucinate interactions
Implement hybrid perception by querying the accessibility tree or DOM to validate element visibility and interactivity before executing screenshot-based actions, filtering out elements with display:none, opacity:0, or negative z-index.
Journey Context:
Pure screenshot agents \(early Operator implementations\) attempt to click buttons that exist in the visual layout but are actually disabled, hidden via CSS \(opacity:0\), or covered by modals. The DOM contains critical state \(disabled attributes, aria-hidden\) invisible to screenshots. Pattern: Use DOM query or accessibility tree to get candidate interactive elements, verify they are visible in the viewport, then use screenshot for precise spatial localization. Don't trust screenshot alone for element existence. Alternative approaches like pixel-color analysis to detect 'greyed out' buttons fail on custom styling and anti-aliasing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:42:12.369060+00:00— report_created — created