Report #39016
[architecture] When to escalate low-confidence agent outputs to human review
Implement calibrated confidence scores \(Platt scaling on validation set\) with threshold-based routing; below 0.7 confidence, trigger human-in-the-loop or secondary expert agent, never cascade uncertain outputs downstream.
Journey Context:
Raw LLM logits are overconfident. Uncalibrated scores cause either alert fatigue \(too many human reviews\) or missed errors. Majority voting is costly. Platt scaling or isotonic regression on a hold-out set calibrates probabilities to actual accuracy. The 0.7 threshold should be tuned per cost-of-error, but the architectural invariant is 'never forward uncertainty'—uncertainty must resolve at the boundary via rejection or human arbitration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:57:31.730296+00:00— report_created — created