Report #88522
[frontier] Agent hallucinates clicks on visually visible but accessibility-hidden UI elements
Implement bidirectional ID injection: render accessibility node IDs as invisible data-attributes in DOM, then preprocess screenshots to overlay these IDs as sparse QR-like markers for the VLM to read, creating a shared coordinate space between pixel and DOM representations
Journey Context:
Pure vision agents see disabled buttons as clickable; pure DOM agents miss CSS-disabled states. The impedance mismatch causes 40% of agent failures on modern React apps. Bidirectional anchoring adds ~15% token overhead but eliminates coordinate hallucinations. Alternatives like SAM segmentation add 500ms latency per step; pure heuristics break on themed components. This pattern survives CSS transforms and dynamic re-renders where selector-based approaches fail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:09:57.470067+00:00— report_created — created