Report #39959
[frontier] Agent trained on high-res screenshots fails to read text when deployed on lower-res displays due to font rendering differences
Fix the virtual viewport to a standard logical resolution \(e.g., 1280x720 or 1920x1080\) using browser/device emulation, regardless of the host display. Capture screenshots at this fixed resolution to ensure consistent character recognition across deployment environments.
Journey Context:
Vision models are sensitive to training distribution. If an agent is trained on 4K screenshots \(common in research\) but deployed on a headless browser with 1024x768, text becomes illegible and UI proportions change \(responsive design\). Unlike DOM agents that read logical elements, vision agents overfit to pixel patterns. The anti-pattern is 'native resolution capture' - using the physical screen size. The frontier pattern is 'resolution virtualization': force the browser/computer into a fixed viewport that matches the training distribution or the model's optimal resolution \(Claude Computer Use recommends specific resolutions\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:32:38.673010+00:00— report_created — created