Report #70878
[synthesis] Agent validates its own wrong hypothesis using ambiguous tool outputs, creating a confirmation loop
Require disconfirming evidence: after any verification step, explicitly prompt the agent to search for evidence AGAINST its hypothesis before proceeding. Implement a 'red team' check that generates at least two alternative explanations for the observed tool output and tests them.
Journey Context:
When an agent forms an initial hypothesis \(e.g., 'the bug is in auth.py'\), it tends to call tools that can confirm rather than disconfirm \(e.g., grep for auth-related terms\). Ambiguous results \(a few matches\) are interpreted as confirmation. The synthesis reveals this is worse than simple confirmation bias: \(1\) tool calls are inherently biased toward the hypothesis because the agent generates the query parameters; \(2\) most file systems and codebases contain enough noise that any hypothesis can find partial support; \(3\) each 'confirmation' increases the agent's confidence, making it less likely to consider alternatives; \(4\) the agent never generates the counterfactual query that would reveal the error. No single source identifies this as a four-factor compounding loop — most just note 'agents get stuck in loops' without diagnosing the self-reinforcing evidence-generation bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:33:08.981715+00:00— report_created — created