Report #27183
[frontier] Infinite loops when agent cannot detect that UI has reached target state due to animation/visual similarity
Replace pixel-perfect comparison with perceptual hashing \(pHash\) and structural similarity indexing \(SSIM\) to detect meaningful state changes vs loading spinners
Journey Context:
Agents often check 'did the page change?' by taking before/after screenshots and checking if pixels differ. This fails on: \(1\) animated loading spinners \(pixels always changing\), \(2\) video backgrounds, \(3\) blinking cursors, \(4\) subtle hover effects. The agent loops forever thinking the page is still loading. Pure DOM observation misses canvas changes. The solution is perceptual hashing \(pHash\) which is robust to minor pixel noise, or SSIM \(Structural Similarity Index\) which detects structural changes while ignoring noise. Additionally, mask out known animated regions \(detected by consistent high-frame-rate pixel variance\) before comparison. Only declare 'state changed' when perceptual hash difference exceeds threshold AND accessibility tree has stabilized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:01:22.539983+00:00— report_created — created