Report #41086
[frontier] Agent fails to detect invisible state in hybrid DOM-vision environments
Use 'Shadow DOM \+ Visual Diff' hybrid—extract semantic structure from DOM, validate via screenshots only when DOM mutations fire, reducing token cost 60% while catching visual regressions
Journey Context:
Pure vision agents miss cookies/localStorage; pure DOM agents miss CSS-driven interactivity. The hybrid uses DOM mutations as trigger events and screenshots for visual grounding, preventing the 'invisible state' desync where the DOM updated but the agent sees old pixels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:26:03.564273+00:00— report_created — created