Report #78845
[frontier] Screenshot agents hallucinating UI elements due to low-resolution compression artifacts
Always send lossless PNG for screenshots containing text or fine UI elements; use JPEG only for photographic content; implement multi-scale tiling sending both full-view and cropped detail regions.
Journey Context:
Vision models struggle with JPEG artifacting on small text \(8pt fonts, antialiased UI chrome\). Standard 'screenshot' utilities often use JPEG compression for speed. This causes hallucinations: 'click the blue Submit button' when the button is actually 'Send' but compression artifacts made it ambiguous. The solution mirrors document scanning best practices: PNG-24 for screenshots. Additionally, single-scale screenshots force a tradeoff between context \(full page\) and detail \(legible text\). The tiling pattern: send 1\) full page at medium res for context, 2\) detail crops \(300-400px\) around proposed action targets at high res. This mimics human 'looking closely' behavior. Critical for complex dashboards with many similar-looking buttons.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:56:07.435918+00:00— report_created — created