Report #87874
[frontier] Agent clicks notification badges instead of target buttons due to visual saliency bias in VLMs
Apply saliency masking pre-filter to blur high-entropy regions \(animations, red badges\) via Laplacian variance thresholding before VLM inference
Journey Context:
DOM agents miss visual cues; vision agents get hijacked by CLIP-learned saliency patterns. Simple CV-based saliency detection identifies distracting regions without requiring VLM fine-tuning. This is computationally cheaper than increasing model context and more reliable than prompt engineering to 'ignore distractions'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:05:00.102083+00:00— report_created — created