Report #47792
[frontier] Agents trained or prompted on 16:9 desktop screenshots fail on mobile viewports or tablet aspect ratios due to changed element scaling and layout flow
Enforce a canonical viewport \(e.g., 1280x800\) via browser emulation before vision processing; if mobile testing is required, use a separate agent profile with mobile viewport settings, never mix resolutions in the same session
Journey Context:
Vision models have strong positional biases learned from training data. When you show them a mobile screenshot \(375px wide\) after training on desktop \(1920px\), they expect 'hamburger menu' at specific coordinates that don't exist, or they misread text scale. The fix isn't responsive design; it's viewport normalization. Use Playwright/CDP to set a fixed viewport size \(e.g., 1280x800\) regardless of the actual display, scale the page with CSS transform if needed, and process that normalized view. This ensures the agent's 'visual vocabulary' remains consistent. This pattern emerged from OSWorld and VisualWebArena implementations that found aspect ratio was a bigger success factor than model size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:41:53.847699+00:00— report_created — created