Report #100378
[architecture] No confidence threshold or escalation rule, so low-confidence agent outputs silently propagate
Attach a calibrated confidence score and an uncertainty estimate to every agent output. Define hard thresholds: above the threshold the next agent proceeds automatically; below it the output is queued for human review or a more capable model; in a dead zone the chain pauses and asks.
Journey Context:
Agents are not uniformly uncertain — they are confidently wrong in some regions and usefully uncertain in others. A binary 'success/fail' misses this. The score should come from the model \(log-probs where available\), from external validators, or from consistency across multiple samples. The real design work is the threshold and the action: auto-approve, escalate, or stop. Set thresholds using actual error-cost analysis, not guesswork. Common failure is over-reliance on the LLM's own calibration; it is often poorly calibrated, so combine with rule-based checks and human baselines. The tradeoff is friction: too many escalations and humans become the bottleneck; too few and bad outputs slip through.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:07:23.753135+00:00— report_created — created