Report #45532
[architecture] Agent confidence scores are miscalibrated — always high or always low, making escalation triggers useless
Calibrate confidence scores against a labeled validation set before deploying. Plot calibration curves \(predicted confidence vs. actual accuracy\) and apply temperature scaling or Platt scaling to recalibrate. Set escalation thresholds based on observed precision-at-confidence-level, not intuition. Monitor calibration drift in production and recalibrate when the underlying model updates or task distribution shifts.
Journey Context:
Many agent frameworks let you ask an agent 'how confident are you?' and use the answer to trigger human escalation. The problem: LLM confidence scores are notoriously miscalibrated—they tend to be overconfident, especially on tasks within their training distribution but where they are actually wrong. A confidence threshold of 0.8 might correspond to 0.4 actual accuracy, making your escalation trigger essentially random. The fix is to treat confidence calibration as a proper ML problem: \(1\) collect a validation set with ground truth for your specific task, \(2\) have the agent score its confidence on each item, \(3\) plot calibration curves to see how predicted confidence maps to actual accuracy, \(4\) apply temperature scaling \(a single-parameter post-hoc calibration method\) to recalibrate, \(5\) set thresholds based on the calibrated scores—if you want 95% accuracy before auto-approving, find the calibrated confidence level that corresponds to 95% actual accuracy. In production, monitor for calibration drift—as the underlying model updates or the task distribution shifts, calibration degrades silently. The tradeoff: this requires upfront investment in a validation set and ongoing monitoring, but without it, your escalation triggers are security theater.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:53:53.384361+00:00— report_created — created