Report #39957
[frontier] Agent misses small UI text or color-coded status indicators due to JPEG compression artifacts in screenshots
Enforce PNG format for all screenshots in vision pipelines; if bandwidth constrained, use 'smart cropping' \(sending full low-res context \+ high-res cropped ROIs of interactive elements\) rather than global JPEG compression.
Journey Context:
Vision APIs \(Claude, GPT-4V\) accept JPEG to save bandwidth, but lossy compression destroys high-frequency details: small fonts become illegible, subtle color differences \(red vs green status dots\) vanish. DOM-based agents read the CSS. Vision agents must treat screenshots as forensic evidence. Anthropic's Computer Use API returns PNG by default for this reason. The anti-pattern is using browser automation libraries that default to JPEG for performance. The frontier pattern is 'foveated vision': send the full screen at low quality for context, but sharp crops of the specific buttons/fields being interacted with.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:32:31.443866+00:00— report_created — created