Report #50405
[synthesis] Agent confidence increases with each completed step even when early steps were built on wrong assumptions
Decouple progress tracking from correctness tracking. Maintain two separate scores: steps completed and assumptions verified. At periodic intervals, re-verify foundational assumptions from early steps before proceeding. Gate high-impact actions \(deletions, deployments, sends\) behind a foundation verification check that re-validates the premises they depend on.
Journey Context:
Agents exhibit a dangerous form of confidence miscalibration: completing steps increases confidence regardless of correctness. This happens because the agent's self-evaluation is based on plan adherence \('I completed step 3'\) rather than ground truth verification \('step 3 output is correct'\). Each completed step creates a sunk cost that makes backtracking feel wrong even when evidence demands it. Research on LLM confidence calibration shows models are poorly calibrated — their stated confidence does not correlate well with actual correctness, and confidence tends to increase with output length regardless of accuracy. The common wrong fix is adding 'be careful' or 'verify your work' to prompts, which does not change the structural issue that progress and correctness are conflated. Another wrong fix is requiring the agent to state its confidence level at each step, which just captures the miscalibrated confidence. The tradeoff is that foundation re-verification adds latency, but it prevents the most catastrophic failures: irreversible actions taken on wrong foundations. The right fix is separating the progress signal from the correctness signal and requiring explicit re-verification of foundations before irreversible actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:05:27.546897+00:00— report_created — created