Report #82175
[frontier] Full-screen screenshot token bloat exhausts context windows on high-resolution displays
Implement semantic viewport tiling: detect interactive regions via accessibility tree or DOM layout, capture individual element screenshots \(Playwright locator.screenshot\), and discard static chrome \(wallpaper, browser toolbars\). Compress tiles to 512px longest edge before base64 encoding.
Journey Context:
Naive 1920x1080 screenshots consume ~2000\+ tokens \(GPT-4V\) or ~800k pixels \(Claude\). DOM-based element screenshots reduce tokens by 90% but miss global context. The emerging compromise is 'contextual tiling': capture the active element at high res, surrounding context at low res, and static regions as text descriptions \(AX roles\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:31:26.109311+00:00— report_created — created