Report #44494
[frontier] Vision models hallucinate clickable buttons or input fields in static images, CSS background gradients, or non-interactive mockups that look like UI but have no DOM event listeners
Implement 'interaction probes' before main actions—use lightweight JavaScript execution to verify \`element.onclick\` or \`window.getEventListeners\(element\)\` exists, or use CDP \`DOMDebugger\` to verify event listeners before clicking visually detected elements
Journey Context:
Set-of-Mark and visual grounding techniques teach models to identify UI via bounding boxes, but they often mark decorative elements as interactive. The failure is attempting to click a 'Submit' button that's actually a \`.png\` background image with no \`\` tag. The fix is never executing clicks based solely on visual appearance—always verify the target has attached event listeners or is a known interactive tag via DOM inspection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:09:10.467320+00:00— report_created — created