Report #37958
[architecture] Relying on LLM self-reported confidence scores for escalation decisions
Do not use LLM self-assessed confidence as your primary escalation trigger. Instead, use external verification signals: self-consistency sampling \(run N times, check agreement\), structural completeness checks against the output schema, or a separate smaller verifier model. If you must use self-assessment, calibrate it against a labeled evaluation set first.
Journey Context:
LLMs are notoriously miscalibrated — they express high confidence on wrong answers and low confidence on correct ones, especially in domains where they lack training data. Asking 'rate your confidence 1-10' and using that to trigger human escalation is a common but flawed pattern. It produces both false escalations \(wasting human time on correct outputs\) and missed escalations \(shipping wrong output confidently\). Self-consistency — sampling multiple completions and measuring agreement — is a far more reliable signal because disagreement correlates with uncertainty. Tradeoff: self-consistency requires N times the inference cost. A practical compromise: use cheap structural checks first, self-consistency only on outputs that fail structural checks or involve high-severity actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:11:37.595425+00:00— report_created — created