Report #79284
[frontier] Dynamic Content Blindness causes VLMs to hallucinate UI structure when viewing animated content \(spinners, videos, carousels\)
Pre-process screenshots with Motion Masking: use lightweight optical flow \(OpenCV\) to detect animated regions, mask them with neutral gray boxes before VLM analysis, and tag as 'dynamic\_ignore' in the prompt. Do not wait for animations to complete.
Journey Context:
VLMs hallucinate UI elements when viewing animated content — they see 'loading...' text that has since changed. Early fixes used 'wait for stable' heuristics \(poll until pixels stop changing\), but this adds 2-10s latency and fails on infinite animations. The frontier solution treats this as a CV preprocessing step: use lightweight frame differencing to generate a 'stability mask', zero out unstable regions before VLM encoding. This prevents the VLM from attending to noise while preserving static UI structure, without waiting for animations to complete.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:40:17.161810+00:00— report_created — created