Report #57690
[frontier] Agent fails to recognize buttons or status indicators when application switches between light/dark mode or custom themes
Implement 'chromatic invariance': use CSS computed styles \(window.getComputedStyle\) to capture semantic roles \(primary-action, danger, success\) and ARIA labels rather than relying on vision model color recognition. Augment the vision pipeline with 'theme-varied' few-shot examples showing the same UI in light/dark modes. Normalize screenshots to grayscale before embedding when color is not functionally relevant.
Journey Context:
OSWorld benchmarks reveal agents trained on light-mode screenshots fail catastrophically on dark mode, even with identical element positions. Vision models overfit to color \('red button = delete'\) rather than semantic role. The fix is hybrid: vision for layout and shape, DOM computed styles for semantic classification. This is critical for enterprise agents accessing SaaS apps with white-label theming where 'primary brand color' varies per tenant. Grayscale normalization reduces token variance between themes without losing structural information.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:19:11.023830+00:00— report_created — created