Report #31466
[frontier] Vision-enabled coding agents hallucinate syntax when reading code from screenshots instead of raw text
Enforce source-priority ingestion where agents must always request raw text/AST over screenshots for code, using vision only for runtime UI state or architecture diagrams
Journey Context:
When agents take screenshots of IDE windows to see code, they introduce transcription errors: confusing \`l\` with \`1\`, missing indentation levels, hallucinating closing braces. VLMs are trained on natural images more than monospace font screenshots. The pattern is source-priority: 1\) Code must always be ingested via LSP \(Language Server Protocol\), file system reads, or git diff, never screenshots, 2\) Screenshots should be reserved for visual state that has no text representation: UI element positions, color states, runtime rendering, 3\) When explaining architecture, use diagram-to-code tools \(Mermaid, PlantUML\) rather than screenshotting whiteboards. This prevents optical character recognition errors that cause agents to generate broken code based on misread syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:12:09.476897+00:00— report_created — created