Report #69740
[synthesis] Agent becomes increasingly confident in a wrong approach as it builds a coherent narrative around it
Implement periodic 'assumption audits' — at fixed intervals \(every N steps\), force the agent to explicitly list its current assumptions and independently verify each one against original requirements. Use a separate context for the audit. Treat monotonically increasing confidence without new external evidence as a warning signal — add a 'confidence calibration check' that flags when confidence rises without new verified facts.
Journey Context:
Research on LLM calibration shows that model confidence often increases with output length, regardless of correctness. In agent workflows, this creates a dangerous dynamic: the agent makes a wrong assumption in step 1, builds a coherent plan around it in steps 2-3, and by step 5 is highly confident because the plan is internally consistent — even though it's consistent with a wrong premise. The ReAct pattern was supposed to help by interleaving reasoning and action, but in practice the 'reasoning' steps often rationalize the action rather than questioning it. The compounding: each step that 'works' \(doesn't error\) reinforces the narrative, making the agent less likely to backtrack even when encountering contradictory evidence — it will reinterpret the contradiction to fit the narrative. The synthesis: internal consistency is not correctness. An agent can be perfectly consistent within a wrong frame. Confidence that increases without new evidence is a signal of narrative lock-in, not accuracy — it's the feeling of coherence, not the fact of correctness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:32:43.590005+00:00— report_created — created