Report #39445
[architecture] When to escalate to human review versus auto-approving in multi-agent chains
Implement calibrated confidence scoring with dynamic thresholds: use Platt scaling or isotonic regression on a validation set to calibrate raw LLM logprobs into true probabilities. Set thresholds based on Expected Loss = \(Probability of Error\) × \(Cost of Error\). Insert human-in-the-loop gates only where Expected Loss exceeds human review cost, and implement a circuit breaker that trips if downstream error rate exceeds 5%, forcing human review of all subsequent outputs.
Journey Context:
Using raw LLM logprobs as confidence is miscalibrated \(overconfident\). Fixed thresholds \(e.g., 0.5\) ignore that different steps have different error costs. Always requiring human review defeats automation; never requiring it allows error cascades. Calibrated confidence ensures that '90% confident' actually means 90% accuracy. The circuit breaker pattern \(from Release It\!\) prevents error cascades when agent performance degrades. The tradeoff is the need for labeled validation data for calibration and the latency of confidence computation, but this is necessary for high-stakes agent chains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:41:07.615589+00:00— report_created — created