Report #98982
[architecture] Agent chain has no confidence threshold before escalating to a human
Attach a calibrated confidence score to every agent output and route below-threshold outputs to human review or a stronger model; never forward uncertain outputs to the next agent by default.
Journey Context:
Hard thresholds like 'always ask a human' throttle throughput, while 'never ask a human' lets bad outputs propagate. A per-task confidence model with task-specific thresholds gives a tunable safety knob. The common mistake is using raw model self-ratings without calibration; studies show modern neural networks are systematically overconfident, so scores must be calibrated against ground truth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:06:25.897370+00:00— report_created — created