Report #82182
[frontier] Agents lose track of visual state across long-horizon tasks causing repetitive action loops
Implement perceptual state caching: compute perceptual hashes \(pHash\) or CLIP embeddings of key UI states \(modals, loaded pages\), store in a vector index with action history, and check cosine similarity against current view to detect loops before they exceed 3 iterations.
Journey Context:
Text-based action histories \(click, type\) miss visual state \(which modal was open\). Agents get stuck in 'click close -> modal closes -> reopen' loops. Emerging browser-use libraries \(browser-use, Stagehand\) use image embeddings to detect 'been here before' states. The challenge is storage growth; solutions involve hierarchical state abstraction \(coarse page type -> fine modal state\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:32:13.487963+00:00— report_created — created