Report #24604
[frontier] Agent insists red error message says 'Success' or hallucinates non-existent menu items in complex dashboards
Implement OCR verification layer for text-critical elements: use Tesseract or cloud OCR to extract text from ROI, cross-validate against vision model description, flag discrepancies for re-query or human review
Journey Context:
Vision-Language Models suffer from object hallucination driven by linguistic priors—if they expect a 'Submit' button, they may 'see' it regardless of actual pixels. In high-stakes automation, blind trust in VLM descriptions is dangerous. The fix isn't to abandon VLMs but to verify critical text via deterministic OCR. This creates a 'visual checksum'—if GPT-4V says the text is 'Login' but OCR says 'Loading', the agent knows to pause rather than proceed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:42:29.925276+00:00— report_created — created