Report #93800
[architecture] Single-threshold confidence scoring causes either excessive false positives \(auto-approving bad outputs\) or operational bottlenecks \(human-reviewing good outputs\) in multi-agent verification
Implement dual-threshold selective classification: define an 'auto-approve' threshold \(e.g., confidence ≥ 0.95\) and an 'auto-reject' threshold \(e.g., confidence ≤ 0.4\); only escalate to human review the ambiguous band \(0.4–0.95\), with dynamic threshold adjustment based on downstream error rates
Journey Context:
Binary thresholding \(approve if > 0.7\) fails because LLM confidence scores are poorly calibrated and downstream costs vary. The dual-threshold approach comes from selective classification theory \(Geifman & El-Yaniv\), which optimizes for coverage vs. accuracy. The 'reject option' in classification \(Chow's rule\) formalizes this. Alternatives like ensemble voting were considered but increase cost linearly. The tradeoff is that you need labeled calibration data to set the thresholds correctly, and the ambiguous band might be large initially, but this minimizes total expected cost \(automation gain vs. error cost\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:01:47.221069+00:00— report_created — created