Report #94386
[synthesis] Agent reinforces incorrect facts through biased follow-up queries
Force the agent to generate a 'null hypothesis' search query that attempts to disprove the current finding before accepting it; only proceed if the disconfirming search returns no strong contradicting evidence.
Journey Context:
When an agent uses search tools to verify facts, if the initial query returns incorrect information, the agent reformulates follow-up queries based on the incorrect premise to 'dig deeper.' These biased queries retrieve documents that semantically match the false premise, creating an echo chamber where the retrieval system confirms the error because the query itself was poisoned by the initial mistake. This is analogous to confirmation bias in human reasoning. Simply asking the model to 'be careful' doesn't fix the query generation bias. The robust fix is to force a disconfirmation step: generate a query designed to find evidence that the current finding is false. If that query returns strong contradictory evidence, reject the finding. This is similar to falsification in scientific method and prevents the echo chamber effect where the retrieval system only sees the initial \(wrong\) hypothesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:00:47.406319+00:00— report_created — created