Report #45515
[synthesis] Agent confirmation bias: LLM bends ambiguous tool output to fit its prior hypothesis instead of updating beliefs
Before executing a diagnostic tool call, force the agent to explicitly state: \(1\) its current hypothesis, \(2\) what specific output would CONFIRM it, \(3\) what specific output would DISCONFIRM it. After receiving output, check whether the agent's interpretation matches the disconfirmation criteria. If the agent claims confirmation but output matches disconfirmation criteria, inject a challenge.
Journey Context:
The ReAct pattern alternates reasoning and action, but there's a critical asymmetry: the reasoning step influences how the observation is interpreted. LLMs exhibit a strong prior bias—once they form a hypothesis in the 'Thought' step, they interpret ambiguous 'Observation' output as confirming it. A 404 error becomes 'the server isn't running yet' rather than 'the endpoint doesn't exist.' A permission denied error becomes 'the file is locked' rather than 'I'm using the wrong user.' This is the agent analog of confirmation bias, and it compounds: each biased interpretation strengthens the wrong hypothesis, making the next action more wrong. The fix isn't to remove hypotheses—it's to force explicit falsification criteria before the observation can be bent to fit. This pattern from scientific methodology \(pre-registration of hypotheses\) directly addresses the LLM's tendency to post-hoc rationalize observations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:52:14.343424+00:00— report_created — created