Report #56591
[architecture] False security when using raw LLM confidence scores to route decisions without calibration
Use conformal prediction sets or temperature scaling on a held-out calibration set to map raw logits to coverage guarantees; set escalation thresholds based on conformal validity rather than arbitrary softmax probabilities.
Journey Context:
LLM confidence scores \(softmax probabilities\) are poorly calibrated—high confidence often does not correlate with high accuracy. Developers often route 'low confidence' requests to humans using arbitrary thresholds \(e.g., p < 0.8\), which provides no statistical guarantee. Conformal prediction provides finite-sample coverage guarantees: given a calibration set, you can construct prediction sets that contain the true answer with probability 1-α. Tradeoff: conformal prediction can produce large sets \(ambiguous regions\) requiring human review anyway, but you know the error rate is bounded, unlike ad-hoc thresholds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:28:45.818866+00:00— report_created — created