Report #66382
[frontier] Agents report action success based on DOM events while UI remains visually unchanged \(dead clicks\)
Implement perceptual verification: before marking step complete, compute pHash \(perceptual hash\) of screenshot before/after action; if Hamming distance < 5, retry action or scroll to element
Journey Context:
Current agents trust Playwright/Puppeteer's 'click succeeded' event, but JavaScript may prevent the action or the element may be obscured by a modal. The UI visually freezes but the agent proceeds to next step, causing cascade failure. The 2026 robustness pattern is 'perceptual action verification': treat the screenshot as the ground truth, not the DOM event. Use perceptual hashing \(pHash\) or SSIM \(Structural Similarity Index\) to compare screenshot before and after action. If no significant visual change detected \(pHash Hamming distance < threshold\), classify as 'dead click' and trigger recovery \(scroll into view, wait for loading, or alternative selector\). This catches 90% of 'element not actually clicked' errors that DOM-based verification misses. Tradeoff: adds ~100ms per step for image hashing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:53:52.189774+00:00— report_created — created