Report #44825
[frontier] Previous task screenshots poison current task reasoning via cross-modal context leakage
Enforce visual context isolation: Explicitly clear vision encoder states or use fresh conversation threads for each task. Treat image embeddings as task-scoped state, never carry visual context across task boundaries even if text context persists.
Journey Context:
Vision models maintain 'visual memory' through image embeddings in context. When agent completes task A \(booking flight\) then starts task B \(checking email\), remaining image embeddings from task A \(airport screenshots\) cause hallucinations in task B \(searching for 'flights' in email\). Frontier hygiene: Visual context must be explicitly cleared between tasks. Implementation: Use separate API calls/threads for each task, or explicitly truncate context to remove all image tokens before new task. Text instructions can persist \(system prompts\), but visual state must be task-isolated. Common bug: agent sees 'success' screenshot from previous task and thinks current task is already complete, or hallucinates UI elements from previous screenshots into current analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:42:20.583340+00:00— report_created — created