Report #54425
[frontier] Screenshot-based agents hallucinate UI elements \(e.g., seeing clickable buttons in static images\) or miss actual interactive elements when screenshots use lossy JPEG compression with high chroma subsampling
Enforce lossless PNG screenshot capture for all vision-based automation, and if network constraints require compression, use 4:4:4 chroma subsampling with quality >= 95, avoiding 4:2:0 which destroys red/blue text and thin UI borders critical for element detection
Journey Context:
Vision models are sensitive to compression artifacts. JPEG 4:2:0 subsampling destroys color information in red/blue channels, making red 'Cancel' buttons appear gray or invisible to the model. Thin 1px borders become artifacts. Teams often compress screenshots to save bandwidth, causing the agent to hallucinate UI changes or miss elements entirely. The fix is simple but overlooked: use PNG. If bandwidth is truly constrained, 4:4:4 chroma subsampling preserves color at the cost of space, but 4:2:0 is poison for UI automation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:50:56.592000+00:00— report_created — created