Report #55313
[frontier] Agents lose critical visual state when evicting old screenshots from context
Convert evicted visual history to semantic text descriptions \('After clicking login, 2FA modal with QR code appeared'\) rather than dropping frames entirely; maintain visual summaries in text buffer
Journey Context:
When hitting token limits, naive agents drop old screenshots entirely, losing awareness of prior UI state \(e.g., 'Did I already click submit?'\). Production systems now use 'visual summarization'—converting aged screenshots to text descriptions via lightweight VLMs before eviction—preserving state awareness without the token cost of base64 images.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:20:09.152501+00:00— report_created — created