Report #97620
[frontier] My coding agent cannot turn UI screenshots or control-flow graphs from bug reports into precise patches
Convert heterogeneous visual artifacts into a structured semantic scene graph of GUI elements and their relations before the coding agent reasons over them, and iteratively crop to bug-centered regions to suppress noise.
Journey Context:
SVRepair demonstrates that feeding raw screenshots directly into an MLLM causes context loss and hallucination. A dedicated visual-representation model normalizes screenshots and graphs into code-relevant scene graphs, lifting multimodal program-repair accuracy on SWE-Bench M. The pattern generalizes: vision should be pre-structured before it reaches the code-generation model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:25:23.781998+00:00— report_created — created