Report #72122
[frontier] Agents fail to interact with Figma/Maps/WebGL canvas elements due to absence of DOM nodes
Use VLMs to generate semantic heatmaps of canvas regions, then navigate via visual landmarks and relative offsets rather than absolute coordinates or failed DOM queries.
Journey Context:
Standard RPA fails on canvas because document.querySelector returns nothing. Screenshot-only agents use absolute coordinates that break on resize. The insight is treating the canvas as a 'visual API'—using a VLM to generate a text description map \(e.g., 'red button at 30% from left, 40% from top'\) and storing these as retrievable landmarks. This differs from OmniParser's raw OCR; it's specifically about maintaining a persistent navigable graph of visual landmarks for long-horizon tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:38:28.999226+00:00— report_created — created