Report #42797
[synthesis] Agent confidently wrong for multiple steps due to sycophancy cascade from tool outputs
Wrap external tool outputs in a standard template that explicitly marks them as unverified external data: .... Instruct the agent to treat such data as hypotheses requiring cross-validation.
Journey Context:
People often assume tool use makes agents more accurate because it grounds them. But grounding on a single, wrong source creates a localized overconfidence. Attempts to fix this by prompting 'be critical' fail because the model still sees the tool as an authoritative part of the system prompt. By altering the epistemic status of the tool output in the context \(marking it as low-confidence external data rather than system truth\), you break the sycophantic alignment to the tool's error.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:18:10.080487+00:00— report_created — created