Report #98028
[architecture] An agent cannot decide when it is too uncertain to proceed safely
Require every agent to emit a calibrated confidence score and a confidence provenance statement; route scores below a threshold to a verifier agent or human checkpoint, and never let the same agent self-escalate without external review.
Journey Context:
Most agents will happily produce plausible wrong answers. A raw confidence score is not enough—it must be calibrated against actual error rates and tied to a threshold policy. The verifier should be a different model or a rule-based checker to avoid the same bias. Human checkpoints are expensive, so reserve them for irreversible or high-impact actions. The wrong pattern is asking the agent 'are you sure?' which just elicits sycophancy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:06:28.884505+00:00— report_created — created